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FOREWORD 


The Software Engineering Laboratory (SEE) is an organization sponsored by the National Aeronautics and 
Space Administration/Goddard Space Elight Center (NASA/GSEC) and created to investigate the effective- 
ness of software engineering technologies when applied to the development of applications software. The 
SEE was created in 1976 and has three primary organizational members: 

NASA/GSEC, Information Systems Center 

The University of Maryland, Department of Computer Science 

Computer Sciences Coiporation, Development and Sustaining Engineering Organization 

The goals of the SEE are (1) to understand the softwai'e development process in the GSEC environment; (2) 
to measure the effects of vaiious methodologies, tools, and models on this process; and (3) to identify and 
then to apply successful development practices. The activities, findings, and recommendations of the SEE 
are recorded in the Software Engineering Laboratory Series, a continuing series of reports that includes this 
document. 

Documents from the Software Engineering Laboratory Series can be obtained via the SEE homepage at: 
http://sel.gsfc.nasa.gov/ 
or by writing to: 

Systems Integration and Engineering Branch 
Code 581 

Goddard Space Elight Center 
Greenbelt, Maryland 20771 
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Experiences in Using the Goal/Question/Metric Paradigm 

Rini van Solingen 

Tokheim, The Netherlands 

Eindhoven University of Technology, The Netherlands 

R. V. Solingen@tm.tue.nl 
Http://www. gqm.nl/ 

Abstract 

Tokheim, a company that provides products and services for the retail petroleum market, 
applies the Goal/Question/Metric paradigm to support their software development 
projects in their central development site in the Netherlands since 1994. Many 
experiences have been gathered during these projects. Experiences includes knowledge 
on software development topics, but also on practical GQM application in industry. 

The presentation will addi'ess a selection of experiences, lessons learned and 
measurement examples collected during the past years. 


GQM experiences published in a book 

These experiences have also been published recently in the McGraw-Hill book: ‘The 
Goal/Question/Metric method: A practical guide for quality improvement of software 
development’ by Rini van Solingen and Egon Berghout. ISBN 0-07-709553-7. 

This book contains practical procedures for GQM application in industry and consists for 
over 50% of practical results and documents from GQM application in four Tokheim 
projects. 


Foreword by Professor Victor R. Basili to the GQM book 

The original ideas for the Goal Question Metric Paradigm came from the need to solve a 
practical problem back in the late 1970s. How do you decide what you need to measure in 
order to achieve your goals? We (Dr. David Weiss and I) faced the problem when trying 
to understand the types of changes (modifications and defects) being made to a set of 
flight dynamics projects at NASA Goddard Space Flight Center. Was there a pattern to 
the changes? If we understood them could we anticipate them and possibly improve the 
development processes to deal with them? At the same time, we were trying to use 
change data to evaluate the effects of applying the Software Cost Reduction methodology 
on the A-7 project requirements document at the Naval Research Laboratory. 



Writing goals allowed us to focus on the important issues. Defining questions allowed us 
to make the goals more specific and suggested the metrics that were relevant to the goals. 
The resulting GQM lattice allowed us to see the full relationship between goals and 
metrics, determine what goals and metrics were missing or inconsistent, and provide a 
context for interpreting the data after it was collected. It permitted us to maximize the set 
of goals for a particular data set and minimize the data required by recognizing where one 
metric could be substituted for another. 

The process established the way we did measurement in the Software Engineering 
Laboratory at Goddard Space Flight Center, and has evolved over time, based upon use. 
Expansion involved the application to other areas of measurement (such as effort, 
schedule, process conformance), the development of the goal templates, the development 
of support processes, the fonnalization of the questions into models, and the embedding 
of measurement in an evolutionaiy feedback loop, the Quality Improvement Process and 
the Experience Factory Organization. Professor Dieter Rombach was a major contributor 
to this expansion. 

The GQM paradigm represents a practical approach for bounding the measurement 
problem. It provides an organization with a great deal of flexibility, allowing it to focus 
its measurement program on its own particular needs and culture. It is based upon two 
basic assumptions (1) that a measurement program should not be ‘metrics-based’ but 
‘goal-based’ and (2) that the definition of goals and measures need to be tailored to the 
individual organization. However, these assumptions make the process more difficult 
than just offering people a “collection of metrics” or a standard predefined set of goals 
and metrics. It requires that the organization make explicit its own goals and processes. 

In this book, Rini van Solingen and Egon Berghout provide the reader with an excellent 
and comprehensive synthesis of the GQM concepts, packaged with the support necessary 
for building an effective measurement program. It provides more than the GQM, but 
describes it in the philosophy of the Quality Improvement Paradigm and the Experience 
Factory Organization. Based upon experience, they have organized the approach in a 
step-by-step set of procedures, offering experience-based heuristics that I recognize as 
effective. They have captured the best ideas and offer them in a straightforward maimer. 
In reading this book, I found myself constantly nodding in agreement, finding many ideas 
I had not articulated as well. They offer several examples that can be used as templates 
for those who wish to have a standard set of goals and metrics as an initial iteration. 

If you work on a measurement program, you should keep this book with you as the 
definitive reference for ideas and procedures. 

Professor Victor R. Basili 
University of Maryland 
and 

Fraunhofer Center for Experimental Software Engineering, Maryland 



About the presenter 


Rini van Solingen (M.Sc.) has been working as a senior software quality engineer at 
Tokheim and as a research fellow at Eindhoven University of Technology, since 1994. 
During this period he worked on all Tokheim GQM projects and performed research on 
software process improvement and measurement. He has published over 50 publications 
in international journals and conference proceedings. He is a member of the IEEE 
Computer society and is an active reviewer for IEEE Software. 
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EXPERIMENTATION 


Engine for Applied Research and Technology Transfer 
in Software Engineering 

Dieter Rombach 

University of Kaiserslautern 
Computer Science Department 
Software Engineering Chair 
Kaiserslautern, Germany 
& 

Fraunhofer Institute for Experimental Software Engineering (lESE) 

Kaiserslautern, Germany 


Abstract ; The empirical work in NASA’s Software Engineering Laboratory in 
the 70’s and 80’s has contributed significantly to the maturing of the sub- 
discipline of ‘experimental software engineering’. The development of 
experimental technologies ranging from the GQM approach for measurement to 
the EF approach for organizational learning provided the scientific basis; the 
successful experiments within the SEL development environment served as 
successful reference examples for others. The Fraunhofer Institute for 
Experimental Software Engineering (lESE) was founded in Germany based on 
the successful SEL principles. It was charged with speeding up the transfer of 
innovative software engineering technologies into a wide variety of industry 
sectors. The concepts of experimentation were developed further and used for a 
wide range of purposes from applied research to technology transfer and 
training. Already during the short history of lESE a successful track record of 
transferring innovative technologies fast and with sustained success has been 
established. This presentation focuses on the adaptation of the successful SEL 
concepts to a different environment, surveys the wide range of applications of 
‘experiments’ as engine for successful technology transfer in a human-based 
development environment, and predicts a growing importance of experimental 
work in the future. 


1. Motivation . The software domain can be characterized by two major 
facts: (1) The gap between state-of-the-art as taught at universities and 
state-of-the-practice as ‘lived’ in most commercial software 
development environments is_significantly higher than in other 
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engineering domains, and (2) the body of knowledge available to 
practitioners consists predominantly of technologies (e.g., languages, 
techniques, and tools), rather than methods and knowledge regarding 
the effects of such technologies in practical development 
environments. One conclusion is that progress in practice is not 
hindered by lack of technology, but by lack of such latter knowledge 
which hinders the transfer into practice. Let’s just illustrate the 
problem for one example technology: There exists a very large 
number of testing techniques today. However, little knowledge exists 
as to the relative strengths and weaknesses of these techniques in 
different industrial settings. So, why would a project manager decide 
to use an alternative testing technique as opposed to the one in use for 
several years. What is needed can be compared best to so-called 
‘engineering handbooks’ in other engineering disciplines. Such 
handbooks describe the available technologies together with their 
applicability, strengths and weaknesses for different constraints. This 
paper describes how such knowledge can be accumulated in a human- 
based development environment via experimentation. 

2. Experimentation . There exist many different ways of accumulating 
software development knowledge. One very important form of such 
knowledge is experience derived from actual application of 
technologies. That means experience is based on product/process 
feedback loops in that process technology is applied, the impact on the 
resulting products is observed, and possible improvements regarding 
the process technology are identified via root cause analysis. In the 
context of this paper, experience resulting from projects accidentally 
or experience existing implicitly only is not considered. However, all 
experiences resulting from systematic hypothesis testing in either fully 
controlled laboratory experiments or semi-controlled field 
experiments and field case studies, and producing explicitly sharable 
insights (models) are considered. Experiments are one of the pre- 
requisites for sustained learning; it is much easier to change behavior 
based on documented first-hand experience, rather than knowledge 
from the world- at-large. Experiments are applicable to basic research 
for the purpose of understanding, to applied research for the purpose 
of packaging technologies together with information about their 
effects in varying project contexts, to teaching & training in order to 
experience the benefits of new technologies for one’s own 
development tasks before project pressure could result in falling back 
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to the old technologies for the fear of risk regarding one’s own 
performance, and to technology transfer for the purpose of adapting 
new technologies optimally to one’s project context and providing 
cost/benefit. 

3. The Role of Experimentation in Software Engineering . The 

software domain is characterized by a number of specific 
characteristics. The most important ones are that most development 
technologies are human-based and that the data are less frequent and 
mostly of non-parametric nature. The human-based nature of most 
technologies makes (a) the change process particularly hard as the 
‘execution engine’ human being needs to be convinced of the benefits 
of changing to a new technology, and (b) the success of any new 
technology depends on the adherence to the process guidelines 
associated with that new technology. Both involves weighing the risk 
of using the new technology versus the risk of staying with the old 
technology. Basically, the cardinal question is ‘Does it work for 
ME?’. Experience data from one’s very environment are an important 
source of confidence for changing to and staying with a new 
technology. The less frequent and mostly non-parametric nature of 
software engineering data requires different techniques for data 
analysis - especially the combination of qualitative and quantitative 
analysis. Beyond that, many of the experimental techniques known 
from other areas can be applied. 

4. Available Tool Box for Experimental Software En 2 ineering . The 

existent body of technologies for experimentation in software 
engineering itself is significant and growing constantly. Most of the 
techniques have been initially created in (or have been at least 
stimulated by) NASA’s Software Engineering Laboratory (SEE). 
Among the most important technologies are 

- the Goal/Question/Metric (GQM) approach for measurement (e.g., 
[Bas93.1], [Rom91]), supporting the derivation of metrics from a 
comprehensive goal specification 

- the Quality Improvement Paradigm (QIP) method (e.g., [Bas93.2]), 
enabling the integration of sound project feedback for project 
control with cross-project learning (NOTE: It adapts the 
Plan/Do/Check/Act approach from manufacturing to the specifics 
of the software domain) 


3 



- the Experience Factory (EF) approach (e.g., [Bas93.2]), defining 
extra learning related roles and integrating them with the 
traditional software development roles 

- a portfolio of experimental designs (e.g., [Bas86]), ranging from 
controlled experiments to regular field case studies 

- a variety of analysis methods (e.g., [Bri92]) for non-parametric 
software engineering data, integrating qualitative and quantitative 
analysis techniques 

In addition, there exist 

- a number of reference laboratory environments applying the above 
experimental technologies such as NASA’s SEE as the ‘mother of 
all laboratory environments’, Fraunhofer lESE in Germany, and 
CAESAR in Australia 

- a number of exchange forums such as the International Network 
for Software Engineering Research (ISERN) for researchers or the 
Software Experience Consortium (SEC) for practitioners 

- a growing number of conferences (e.g., METRICS , SEE 
Workshop) and journals (e.g.. International Journal for Empirical 
Software Engineering) 

All this provides a sound starting point for experimental work. The 
ISERN Network is open to everybody interested in further develop- 
ing the experimental technologies, teaming up in concrete technology 
experiment replication, and exchanging all kinds of experiences. The 
contact address is ‘isem@informatik.uni-kl.de’. The SEC Consortium 
is open for application by companies active in the area of empirical 
work or corporate experience management. The contact address is 
‘fshull@fc-md.umd.edu’. 

5. Fraunhofer lESE: An Institute built on the Experimental 
Paradigm. The Fraunhofer Gesllschaft e.V. in Germany is Europe’s 
largest applied research and technology transfer organization. It 
consists of 48 institutes ranging in application domain from material 
sciences and production technology to information & communication 
technology and life sciences. These institutes receive approximately 
30% base funding from government; the remaining 70% of their 
operating budgets have to be covered from industry project income. 
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The Fraunhofer Institute for Experimental Software Engineering 
(lESE) became the 48* permanent Fraunhofer Institute [Rom96]. 
Founded in 1996, its area of competence is software engineering; its 
applied research and transfer model is based on the experimental 
paradigm. That means Fraunhofer lESE helps companies to establish 
experimentally based learning organizations as a pre-requisite for 
sustained improvements, and then helps them introduce new 
innovative software development technologies (technical & 
managerial). With the base funding from government, technologies 
from basic research institutions are being evaluated via 
experimentation, and packaged together with the experimental results 
for transfer into specific domain and company environments. 

Today, Fraunhofer lESE employees 80 full time scientists together 
with about 60 part-time personnel such as students and consultants. 
Th institute language is ‘English’; 25% of personnel is non-German. 
The percentage of industrial income has risen to about 70% within 
three years. Collaborations include a large number of Europe’s 
leading companies in the sectors of telecom, automotive & aerospace, 
and banking/insurance/trade. 

Fraunhofer lESE has been created as the German instantiation of the 
NASA/SEL laboratory model. It was widely accepted that a closer 
collaboration between academia and industry was needed. This 
institutionalized model - allowing for long-term trusting relationships 
between academia and companies - was the answer. The reference to 
the working SEE example was one of the major arguments to finally 
convince companies and government of the opportunity at hand. 
Many of the concepts of lESE are based on SEE experiences by 
myself during my tenure at the University of Maryland and my 
involvement with NASA/SEE during the 1986-1991 time frame. 

The main SEE concepts adopted include 

- provision of an environment in which researchers, software 
developers, and customers can work together 

- use of experimentation as a major research and technology transfer 
engine 

- establishment of long-term relationships with development 
organizations 
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- exposing researchers to practice and practitioners to research 

- have research being driven by practical needs (= applied research!) 

However, there are some important differences compared to the SEL. 
They include 

- operation as a business due to the fact that Fraunhofer Gesellschaft 
e.V. is a legal non-for-profit entity not associated with any 
university or for-profit company environment (business plan for 
140 employees!) 

- tougher sales job for close academia/industry collaboration due to 
a historically wider gap between academia and industry in 
Germany as compared to the US 

- need for critical mass in lESE core competence areas personnel- 
wise due to the expectation by companies to support them 
strategically (i.e., long-term, always with experienced personnel) 

- need for application sector know-how in addition to software 
engineering competence due to the fact that lESE collaborates with 
companies from different industry sectors 

- need for complex incentive structure in order to provide equal 
motivation to researchers and practitioners working in lESE 

Although, many of the experiences and lessons learned within the 
SEL could be reused, the changes due to the collaboration culture and 
heterogeneity in customer base posed the biggest challenges. 
However, the achieved high standing of lESE within the scientific and 
industrial community demonstrates the possibility of replicating the 
SEL experience. 

6. Useful Applications. This section describes briefly some of the 
typical applications of the experimental paradigm within the 
Fraunhofer lESE. These applications comprise - due lESE’s mission 

- applied research, teaching & training, and technology transfer. It is 
intended to describe the wide applicability and usefulness of 
experimentation - even in a very industry oriented setting. 

6.1. Applied Research. It has been firmly established at lESE that 
applied research in software engineering produces new/refmed/exis- 
ting technologies together with recorded observations regarding their 
effectiveness in one or a class of industrial setting (i.e., certain 
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constraints). These observations need to be produced by some 
appropriate form of experimentation. These observations are only 
useful, if the underlying experiment is documented well enough to be 
repeatable by anyone challenging the findings or trying to replicate 
them in a slightly different environment. Observations from non- 
repeatable experiments do not contribute to the state-of-the-art. In that 
context, it must also be agreed that experiments with negative results 
are equally valuable. Negative results combined with qualitative 
analysis investigating possible causes and deriving new hypotheses 
contribute to learning. There exist only badly designed and/or 
performed experiments, no bad results! 

Such experiments have been done for most of the lESE technologies 
ranging from software development to management and experimental 
technologies. The most prominent experiments include the 

- effectiveness & efficiency of step-wise abstraction code reading 
(e.g., [Bas87]) 

- effectiveness & efficiency of perspective-based requirements 
reading (e.g., [Bas96]) 

- maintainability of well-structured 00 programs (e.g., [Bri97]) 

- maintainability of well-documented (traceability from requiremens 
to code) programs 

- cost/benefit ratio for product line development 

All these experimental results are published in the literature. Most of 
them are accessible through the lESE web site. More experiments on 
the above as well as other topics are needed. Every software 
engineering researcher should feel challenged to participate. The 
International Software Engineering Research Network (ISERN) 
provides a stimulating environment to learn, share and collaborate. 
Please contact ISERN ( www.iese.fhg.de/iSERN/ . isem@informatik.uni- 
kl.de)! 

6.2. Teaching & Trainin 2 . Software engineering teaching and training 
must include the topic of experimental methods (see e.g., CMSC735 
at the University of Maryland OR SE2 at the University of 
Kaiserslautern) as well as their practical application to self-experience 
important software engineering principles (see examples from the 
University of Kaiserslautern below!). The simple lecturing of software 
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engineering principles results too often in them being ignored during 
the next development tasks. Again the issue is that changing behavior 
requires motivation that the risk of change is manageable. 
Experiments as part of teaching can provide the necessary motivation. 
During practical industrial training such experiments can be repeated 
for the same reason of motivation for change. In addition, 
experimentation can demonstrate the applicability of some technology 
to the specific company setting and suggest some necessary 
adjustments prior to real use. 

Together with the University of Kaiserslautemn Fraunhofer lESE has 
developed a number of technology demonstration experiments which 
are being repeated during every graduate level software engineering 
class as well as during industrial training (modified according to 
company constraints!). The standard experiments include 

- demonstrating the superiority (i.e., effectiveness, efficiency) of 
code reading over unit testing (adaptation of the old ‘Selby’ 
experiment) (e.g., [Lot96]) 

- demonstrating the superiority (better understandability, modifia- 
bility) of well-structured 00 designs over worse structured ones 

- demonstrating the superiority (better modifiability) of tractably 
documented programs over less tractably documented ones 

- demonstrating the superiority (i.e., effectiveness, efficiency) of 
perspective-based reading of informal requirements over other 
reading techniques 

Each of these experiments has been performed at least three times. 
Comprehensive laboratory packages are available describing the 
experiment and providing key artifacts for easy replication in other 
environments. 

6.3. Technolo 2 v Transfer. The purpose of experimentation in 
technology transfer is twofold: First before the introduction of a 
candidate new technology experimentation helps to convince 
personnel (top management to invest in it, project management to 
support it, and project personnel to ‘live’ it under project pressure) of 
the potential benefits of a pre-packaged new technology, and it helps 
to adapt pre-packaged technology to specific needs of the target 
organization. Second during use of the new technology 



experimentation helps to change the technology further in order to 
optimize its effects, and it helps to re-enforce its continued use and, as 
a result thereof, ensures its continued gains. 

During its 3 year history Fraunhofer lESE has contributed to many 
sustained process improvements in industry which would have been 
impossible without experimentation (e.g., [Lai97]). An extensive list 
of company references can be obtained from the lESE web site. 

7. Outlook . Experimentation is becoming an integral sub-discipline of 
software engineering. Reflecting the general needs of an engineering 
discipline and the specific characteristics of the software domain, a 
body of technologies and reference applications have been created. 
The role of NASA/SEL has been equally instrumental to the area of 
experimentation as has been the SETs role to the area of assessments. 
NASA/SEL together with its off-springs (e.g., Fraunhofer lESE) has 
pioneered the application of experimentation to speed up the 
accumulation of shareable, testable & repeatable knowledge in 
research, to raise a generation of true software engineers thru teaching 
and training, and to speed up the infusion of innovative software 
development technologies into practice in technology transfer 
programs. More and more environments will recognize that 
experimentation does not represent additional effort, but rather speeds 
up the production of real contributions to the state-of-the-art in 
software engineering and their transfer into practice. As the 
performance of real experiments require laboratory set-ups at 
universities or in companies, more of such environments must be 
established. 

I wish the SEL a successful future! May it spin off more 
laboratory environments around the globe! May it be valued 
inside NASA as highly as it is outside! 
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Software Experience Center: 

The Evolution of the 
Experience Factory Concept 
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{frank, houdek, kurt.schneider}@daimlerchrysler. com 

Abstract 

The experience factory concept, which was evolved at the NASA 
Software Engineering Laboratory, is a promising concept geared at 
facing the current needs in software development and software process 
improvement. Therefore, we at DaimlerChrysler decided to implement 
it in several business units to maintain and improve software engi- 
neering competence. In our efforts to estabhsh the experience factory 
concept, we identified some shortcomings resulting from (unstated) as- 
sumptions. In this paper, we point out these assumptions and present 
how we evolved the experience factory concept. In particular, we in- 
troduced reinfusion concepts, concepts for experience evolution and 
for cost /benefit-ratio of experience items. An example taken from our 
business units helps to concretize oiur findings. 


1 Introduction 

Software engineering knowledge is becoming more and more a strategic busi- 
ness competence — both for software and system developing companies. The 
ability to produce high-quality software within a reasonable time and budget 
is becoming critical for economic success. 

The experience factory concept developed by Basili and co-workers in a 
collaboration with the NASA Software Engineering Laboratory, the Univer- 
sity of Maryland and the Computer Science Corporation [Bas89, BCM+92, 
Bas93, BCR94, BC95, BM96] is a promising approach to build up and main- 
tain software engineering knowledge related to the specific needs of an en- 
terprise. 

Hence, in 1997 DaimlerChrysler decided to implement the experience 
factory concept within several software-intensive business units [HSW98, 
LSH99, WHS99, HB99]. In particular, we started initiatives in passenger 
car development, military aircraft development and central IT services, each 
of which is by the corporate research department. Our mission was to estab- 
lish the experience factory concept within two to three years. The overall 
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goal of the initiatives was to improve software development competencies. 
The individnal goals, however, varied in the bnsiness nnits depending on 
their specific demands. Central IT services, for instance, was interested in 
improving their software contracting processes, whereas, in passenger car 
development, defect profiles and defect tracking were important concerns. 

At the beginning of onr initiatives, we tried to instantiate the SEL’s 
experience factory concept by building up an independent organizational 
unit, defining experience documentation procedures and running measure- 
ment programs. But these activities are long-term activities and the business 
units involved were also seeking short-term benefits. Their motivation to act 
as partners in the experience factory initiatives was to achieve significant im- 
provements in their situation and sustain it within the initiative schedule. 
As a consequence, we were forced to evolve the experience factory concept 
in order to initiate short- and long-term experience-based improvements in 
parallel. We call the resulting approach software experience center (SEC) 
and we will discuss our findings in this paper in some detail. 

1.1 Structure of the Paper 

Section 2 briefly summarizes the SEL experience factory concept and em- 
phasizes the assumptions behind it. Building on this concept, we introduce 
three necessary dimensions of evolution for the ‘classic’ experience factory 
concept in Section 3. Section 4 gives an example illustrating the evolved 
concepts. Section 5 summarizes the findings of these paper. 

2 The Origin: SEL Experience Factory 

Process improvement is hard work. Deficits have to be identified, improve- 
ment activities must be defined and implemented, and their effects mon- 
itored. This is how most improvement approaches work. However, these 
activities are only partially useful for a single project which has to create 
a product within a given schedule and cost frame. To make improvement 
activities successful in the long run, projects concerns have to be clearly 
separated from improvement concerns. 

This insight was the main trigger for the experience factory concept, 
which is based on the quality improvement paradigm (QIP, see [BCR94]). 
The experience factory concept proposes a (logical and organizational) sep- 
aration of project organization (responsible for building products) and im- 
provement organization (responsible for improving processes within and 
across projects). The experience factory organization supports individual 
projects by providing them with experience gained from work in previous 
projects. The observations made in the new project are, in turn, used to up- 
date the organization’s experience base (see Figure 1). And, a cross-project 
learning process becomes alive. 
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Project organization 


Experience factory organization 



Figure 1: Experience factory concept [BCR94]. 


This concept is rather obvious. It helped to clear our mind. In many 
improvement projects at DaimlerChrysler it helped us a lot [HSW98]. The 
experience factory concept provides a long-term vision for improvement ini- 
tiatives tailored to particular business units needs. 

Even though this concept is obvious, it has several implications and 
makes several assumptions: 

• Long-term activity. The improvement approach underlying the ex- 
perience factory concept is the QIP. According to the QIP, process 
improvements are, by their nature, long-term. First, the actual situa- 
tion has to be basedlined. Then, improvement activities are defined, 
implemented and assessed. Typical time-frames for QIP-based im- 
provements are one to three years. 

• Additional effort. Prom the perspective of a single project, process 
improvement and learning require additional resources (e.g. for mea- 
suring) which do not pay off immediately. 

• Common understanding. An important step in every improvement 
initiative is defining an improvement goal. To do so, people need to 
know and articulate their needs accordingly. 

• Similar projects. The basic idea of the experience factory concept is to 
learn in one project and to transfer the gained experience to another 
one. The essential prerequisite is that both projects are sufficiently 
similar. 

• Processes in place. Process improvement requires fairly mature pro- 
cesses that are beyond the ad-hoc stage. 

• ‘Homo economicus’. Improvement activities have similarities with 
farming. One has to seed now (spending some effort) to harvest (some 
more) in the future. Common sense tells us that this is a reasonable 


3 



thing to do. However, humans do not always act reasonably with 
respect to long-term economic considerations. 

• Will to change. Improvement is almost always tied with changes: 
changing processes, changing responsibilities, changing personal be- 
havior. But changing is never easy. Although it is reasonable to 
change, people are often reluctant to do so. 

• Pull for external knowledge. Learning across projects is essential in 
the experience factory concept. This means, that people are willing 
to learn and willing to accept knowledge and experience gained in 
other environments (i.e., projects). Moreover, people have to ask for 
knowledge, trawl for experience items, seek for better processes. So 
there must be an active pull for helpful information. 

• Management support. Every change needs a powerful sponsor. To 
bring the experience factory concept to life, permanent support from 
powerful sponsors (i.e. management at all levels) is mandatory. 

In most environments, there are some deficits concerning the issues men- 
tioned above. In particular, a long-term commitment at all levels (manage- 
ment, project members) is hard to uphold. An external observer would argue 
that it is worthwhile to spend effort for activities whose return on investment 
is not immediately yielded. However, project workers who are permanently 
‘up to their necks in hot water’ have a slightly different perception. They 
can accept only short-term initiatives. They want to see improvement right 
now. 

For these reasons it is not sufficient to introduce the ‘classic’ experience 
factory concept in a ‘typical’ organization. In the next section, we show 
how we have evolved the experience factory concept to cope with the above 
mentioned issues. 

3 Dimensions of Evolution 

In our experience factory initiative at DaimlerChrysler, we began with the 
‘classic’ experience factory concept. But after a short time it became obvious 
that it is impossible to uphold the long-term commitment required without 
short-term benefits for the persons involved (see [HSW98, WHS99]). We 
were forced to evolve the ‘classic’ experience factory concept. 

Figure 2 sketches the identified dimensions of evolution graphically. In 
the following, we focus on them in some detail: 

• Reinfusion concepts. The ‘classic’ experience factory concept em- 
phasizes experience collection: for example, measurement programs. 
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Reinfusion concepts 


“Quality of 
information” 



Initial 

seed, 

evolution, 

reseeding 


Figure 2: Dimensions of evolution. 


model building, formalization and generalization. Reinfusion of expe- 
rience, i.e. delivery of experience items into projects, is seen to happen 
naturally (after some tailoring). 

This assumption does not hold for several reasons: (1) People do not 
ask for relevant experience items by themselves. Typically, they do 
their job as best they can. Therefore, it is crucial to provide experience 
items for the task at hand at the right time [FLO+96, LS97]. (2) People 
do not know that they might need some additional experience items. 
Either they assume there is nothing relevant in the experience base 
or they do not even recognize their current job as being experience- 
intensive. 

• ‘Quality of information. ’ Measurement based-information and derived 
models are the prime experience items provided by the ‘classic’ expe- 
rience factory. This type of experience is desirable because it provides 
detailed and objective information. However, gathering it is labori- 
ous and time-consuming. The time delay from experience collection 
to harvested benefit is fairly long (sometimes several years). Staying 
alive in view of short-time expectations, experience items with shorter 
reuse-cycles are also required. Of course, their potential benefit might 
be only slight, as the information is less consolidated and more sub- 
jective. Figure 3 gives examples of different types of experience items. 
It also qualitatively depicts the trade-off between the effort needed to 
build a particular experience item and its expected benefit. Before 
building a new experience item, the utility (i.e. the ratio of expected 
benefit and needed effort) should be assessed. 

• Initial seed, evolution and reseeding. 

The ‘classic’ experience factory concept is driven by the QIP. This 
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Figure 3: Quality of information. 


implies that there is always a clear goal for improvement activities 
and experience collection. This assnmption does not hold in practice. 
We often observe a moving-target situation, i.e. the goal and the as- 
sociated needs change over time, sometimes by accident, sometimes 
due to the experience items delivered. A more dynamic approach is, 
thus, needed to avoid wasting a great deal of effort for experience- 
building activities (e.g. measurement programs) which provide expe- 
rience items that are not really helpful. A closed feedback-loop of 
seeding (i.e., providing some cheap experience items), evolution (of 
needs) and reseeding (i.e. adjusting experience items and experience 
collection processes) is essential [Fis98]. 


4 Example 

In this section, we present an example taken from our experience factory ini- 
tiatives to illustrate how we evolved the ‘classic’ experience factory concept 
in practice. 

This example is from the central IT services business unit. This unit is 
involved in large projects developing systems for administrative purposes like 
global sales, warranty management or diagnosis. Typically, such systems are 
not built in-house but contracted out to one or more suppliers. Central IT 
services is responsible for contractor management and associated activities 
such as acceptance processes or quality definition. 

Our mission (corporate research) was to establish an initial experience 
factory group there. The experience items they were to maintain were aimed 
at supplementing all the activities concerned with contracting software out 
and performing acceptance tests at delivery time. In the beginning of our 
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activities, we acted as ‘experience factory guys’. With time, people from the 
central IT services were to take over our roles. 

We started (according to the QIP) with an extensive baselining to iden- 
tify the existing processes, quality needs, etc. 

During this work, we encountered a ‘pull’ situation, i.e. demand for 
experience items (for, in this case, processes for contract evaluation) arose. 
This issue has not been covered by the currently implemented experience 
factory activities so far. Setting up a serious analysis of existing processes 
(as the ‘classic’ experience factory concept would imply) would have resulted 
in a long-term activity. Instead, we performed interviews, studied relevant 
literature and (company) standards, tailored the findings towards the actual 
needs and provided simple guidelines (how-to notes). 

Founding on our baselining activities, we encountered some other ques- 
tions which were not directly articulated by the projects but which might 
become vital in future activities (e.g. risk assessment, role of quality man- 
ager). Consequently, we also built experience items for these topics. Unlike 
the contract evaluation process item, we had to sell these experience items. 
This was mainly, because people were not aware of the utility of these issues 
(e.g. risk management). 

We used selling and applying experiences to improve the existing expe- 
rience items. Figure 4 depicts the flow of experience across several projects 
in the central IT services business unit. However, there is no indication 
whether a flow of information was initiated by pull or push. There were 
variations across projects and over time. 


- Acceptance process 

- QM tasks 
■ Reviews at the supplier 

- Contractual issues 

- Risk portfolio 


- Quality models 

- Quality cicie 

- Role of quality 
manager 


- Contractual hints 

- Evaluation of bids 


base 

Quality models 




- Contractual hints 


Risk 

management 


QM as an 
IT purchaser 



Contractual issues 


- Risk analysis 

- Risk sheets 

- Quality models 


- Risk portfolio 

- Contractual hints 

- Reviews 

- Quality models 



-QM tasks 

- Role of quality manager 

- Risk portfolio 

- Advanced risk analysis 


- Goal-based 
risk analysis 


Figure 4: Experience transfer at central IT services. 
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It is important to realize that there was neither an initial pull for most 
of these items nor a clear agreement that these and only these items were 
relevant. 

With respect to the above mentioned evolution dimensions, we made the 
following contributions: 

• The created experience items were fairly cheap to build with only 
a limited benefit but a positive cost /benefit ratio (e.g. checklist for 
contracts, initial quality model). Experience items of better quality 
were only built in cases where return on investment was anticipated 
within a reasonable time- frame. Especially the fact that the experience 
factory initiative showed benefit to the projects within a rather short 
time helped us greatly to become accepted in this business unit. 

• It was not clear from the beginning where to go as the people involved 
were unable to articulate their particular needs. So we were not able 
to start with a clear goal in mind but had to work iteratively and prov- 
idently. We started with an initial seed (hints on writing contracts) 
and evolved the experience base content over time. Starting a QIP 
program would have not produced the same output. The goal identi- 
fication would only have raised topics which the people were aware of. 
However, we found some items to be extremely helpful which would 
not hove been raised as people did not know them. 

• Rarely did people seek for experience items. So the assumption that a 
filled experience base is enough to make an experience factory helpful 
proved to be false. 

More often, we (as the experience factory guys) had to push our items 
in meetings and project planing sessions. More details on the relation 
of pull versus push (which is the main reason for reinfusion concepts) 
can be found in [WHS99]. 

5 Summary 

The experience factory concept which was evolved at the NASA Software 
Engineering Laboratory is a promising concept geared at the current needs 
in software development and software process improvement. It addresses 
the burning issues of a business unit rather than proposing one-size-fits-all 
solutions. 

To understand its transferability to other environments, it is important 
to understand its evolution and its assumptions. The experience factory 
concept is the outcome of many years of work performed by Basili and co- 
workers [BCM+92] at the NASA SEL. It is a result of process improvement 
activities according to the PDCA principle (i.e. QIP [BCR94]) and the 
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perception that successful improvement activities must be separated orga- 
nizationally from project work. At SEL, it was never the goal to ‘build 
an experience factory’, but the resulting organization was ex-post called an 
experience factory after it grew for several years. 

If you intend to establish the experience factory concept within a fairly 
short time-frame (e.g. two to three years), some shortcomings of the concept 
become obvious resulting from (unstated) assumptions behind the concept. 
Primarily, it is assumed that project people believe in the (long-term) ben- 
efits of an experience factory. 

In our experience factory initiative at DaimlerChrysler, we identified 
some areas in the experience factory concept that need to be evolved. In 
particular, we recognized the need for reinfusion concepts, concepts for ex- 
perience evolution and a continuum of experience items ranging from easy- 
to-build but short-term-benefit items (e.g. how-to notes, expert networks) 
to solid high-impact packaged experience (e.g., results from GQM measure- 
ment programs). 

We call the evolved experience factory concept the ‘software experience 
center’ (see Figure 5). 



Figure 5: Software experience center. 
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Software testing is a well-defined phase of the software development life cycle. 
Functional ("black box") testing and stmctural ("white box") testing are two methods of 
test case design commonly used by software developers. A lesser known testing method 
is risk-based testing, which takes into account the probability of failure of a portion of 
code as determined by its complexity. For object oriented programs, a methodology is 
proposed for identification of risk-prone classes. 


Risk-based testing is a highly effective testing technique that can be used to find and fix 
the most important problems as quickly as possible. Risk can be characterized by a 
combination of two factors: the severity of a potential failure event and the probability of 
its occruTence. Risk can be quantified by using the equation 


Risk = Z p(Ei) * c(Ei), 

Where i =1,2,. . ,,n. n is the number of unique failure events, Ei are the possible failure 
events, p is probability and c is cost. 

Risk-based testing focuses on analyzing the software and deriving a test plan weighted on 
the areas most likely to experience a problem that would have the highest impact 
[McMahon]. This looks like a daunting task, but once it is broken down into its parts, a 
systematic approach can be employed to make it very manageable. 

The severity factor c(Ei) of the risk equation depends on the nature of the application and 
is determined by domain analysis. For some projects, this might be the critical path, 
mission critical, or safety critical sections. Severity assessment requires expert 
knowledge of the environment in which the software will be used as well as a thorough 
understanding of the costs of various failures. Musa addresses how to estimate the 
severity of software failures in the discussion of "Operational Profiles" in his book. 
Software Reliability Engineering. Both severity and probability of failure are needed 
before risk-based test planning can proceed. Severity assessment is not addressed here 
because it involves so much application-specific knowledge. Instead we confine the 
remainder of the discussion to the first part of the risk equation, ranking the likelihood of 
component failures, p(Ei), and a way to capture the information directly from the source 
code, independent of domain knowledge. 

The first task of risk-based testing is to determine how likely it is that each part of the 
software will fail. It has been proven that code that is more complex has a higher 


1 



incidence of errors or problems [Pfleeger]. For example, cyclomatic complexity has been 
demonstrated as one criterion for identifying and ranking the complexity of source code 
[McCabe], Therefore, using metrics to predict module failures might simply mean 
identifying and sorting them by complexity. Then using the complexity rankings in 
conjunction with severity assessments from domain risk analysis would identify which 
modules should get the most attention. But module complexity is a univariate measure, 
and it could fail to detect some veiy lisk-prone code. In particular, object oriented 
programming can result in deceptively low values for common complexity metrics. The 
nature of object oriented code calls for a multivariate approach to measure complexity 
[Rosenberg]. 

We are going to narrow the topic further and focus specifrcally on object oriented 
software. The Software Assm'ance Technology Center (SATC) at NASA Goddard Space 
Flight Center has identified and applied a set of six metrics for object oriented design 
measurement. These metrics have been used in the evaluation of many NASA projects 
and empirically supported guidelines have been developed for their interpretation. The 
metrics are defined as follows: 

1 . Number of Methods is a simple count of the different methods in a class. 

2. The Weighted Methods per Class (WMC) is a weighted sum of the methods in a class 
[Chidamber]. If the weights are all equal, this metric is equivalent to the Number of 
Methods metric. The Cyclomatic Complexity [McCabe] is used to evaluate the 
minimum number of test cases needed for each method. Weighting the methods with 
their complexities yields a more informative class metric. 

3. Coupling Between Objects (CBO) is a count of the number of other classes to which 
a class is coupled. It is measured by counting the number of distinct non-inheritance 
related class hierarchies on which a class depends [Chidamber]. Coupled classes 
must be bundled or modified if they are to be reused. 

4. The Response for a Class (RFC) is the cardinality of the set of all methods that can be 
invoked in response to a message to an object of the class or by some method in the 
class [Chidamber]. 

5. Depth in Tree (DIT) - The depth of a class within the inheritance hierarchy is the 
number of jumps from the class to the root of the class hierarchy and is measured by 
the number of ancestor classes. When there is multiple inheritance, use the maximum 
DIT. 

6. Number of Children (NOC) - The number of children is the number of immediate 
subclasses subordinate to a class in the hierarchy. 

Having defined the metrics, we need interpretation guidelines to assist in identifying 
those areas of code deemed to be at high risk. For over three years, the SATC has been 
collecting and analyzing object oriented code written in both C++ and Java. Over 20,000 
classes have been analyzed, from more than 15 programs. The results of the analyses 
have been discussed with project managers and programmers to identify threshold values 
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that do a good job of discriminating between “solid” code and “fragile” code.* Once the 
individual metric thresholds were determined, analysis revealed that a multivariate 
approach provided an excellent basis for planning risk-based testing. 

When we first began to apply some of the traditional metrics to object oriented code, we 
saw that their values were generally much lower than we were accustomed to seeing for 
functionally written code. Judging by the old thresholds, the 00 code appeared to be 
much less complex and much more modular than the non-00 legacy code. But because 
of the fundamentally different way an 00 system is built, the low numbers were often 
very deceptive - ignoring the interactions between classes, and missing the complexities 
due to the use of inheritance. The following threshold values for the individual metrics 
were derived from studying the distributions of the metrics collected. 

• Number of methods (NOM) - < 20 preferred, < 40 acceptable per class. The 
counting tool included explicit constmctors and destructors in the method counts, so 
these thresholds are inflated. Taking that into account, the recommended number of 
actual implemented methods ttanslates to under 10 per class. 

• Weighted Methods per Class (WMC) - < 25 preferred, < 40 acceptable. The 
mmiber of methods and the complexity of those methods are a predictor of how much 
time and effort is requfred to develop and maintain the class. While the NOM may be 
inflated by the beneficial use of constructors, WMC provides a better idea of the true 
total complexity of a class. 

• Response for Class (RFC) - < 50. We have seen very few classes with RFC over 
50. If the RFC is high, this means the complexity is increased and the 
understandability is decreased. The larger the number of methods that can be invoked 
from a class through messages, the greater the complexity of the class, complicating 
testing and debugging. Making changes to a class with a high RFC will be very 
difficult due to the potential for a ripple effect. 

• RFC/NOM < 5 for C++, <. 10 for Java. This adjusted RFC metric does a good 
job of sifting out classes that need extensive testing, according to developer feedback. 
The Java language enforces the use of classes for everything, which automatically 
drives up the value of this metric. 

• Coupling Between Objects (CBO) - < 5. A high CBO indicates classes that may 
be difficult to understand, reuse or maintain. The larger the CBO, the higher the 
sensitivity to changes in other parts of the design and therefore maintenance is more 
difficult. Low coupling makes the class easier to understand, less prone to errors 
spawning, promotes encapsulation and improves modularity. 

• Depth in Tree > 5 means that the metrics for a class probably understate its 
complexity. DIT of 0 indicates a “roof’; the higher the percentage of DIT’s of 2 and 
3 indicate a higher degree of reuse. A majority of shallow trees (DIT’s < 2) may 
represent poor exploitation of the advantages of 00 design and inheritance. On the 
other hand, an abundance of deep inheritance (DIT’s > 5) could be overkill, taking 
great advantage of inheritance but paying the price in complexity. When there is such 


It should be noted that the values of some of the OO metrics depend just as much on the design as they do on the 
actual coding. Much of the complexity of an OO system is fully determined before the programmers begin to write the 
code. Design complexity measurement is another topic that deserves researchers’ attention. 
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liberal use of inheritance, the aforementioned class metrics will understate the 
complexity of the system. 

• Number of Children (NOC) The greater the number of children, the greater the 
likelihood of improper abstraction of the parent and need for additional testing, but 
the greater the number of children, the greater the reuse since inheritance is a form of 
reuse. While there is no “good” or “bad” number for NOC, its value becomes 
important when a class is found to have high values for other metrics. The 
complexity of the class is passed on to all of its child classes and total system 
complexity is greater than it seemed at first glance. 

A single metric should never be used alone to evaluate code risks, it takes at least two or 
three to give a clear indication of potential problems. Therefore, for each project, the 
SATC creates a table of high risk classes. High risk is identified as a class that has at least 
two metrics that exceed the recommended limits. Table 1 is an example of information 
that would be given to a project. The classes that exceed the expected limits are shaded. 


Class 

# Methods 

CBO 

RFC 

RFC/NOM 

WMC 

BIT 

NOC 

Class 1 

i ■54.: 

8 

536 

mgiiKi 

175 

1 

0 

Class 2 

7 

6 

168 

24 


71 

4 

0 

Class3 

33 

4 

240:11 

1.2 

105 

2 

0 

Class? 


8 

361:11 

6.7 

117 

2 

2 

ClassS 



6 

378* 

6.1 

163 

2 

0 

Class 10 

63 

iliiiii 

235 

3.7 

156 

2 

0 

Class 11 

81 

10 

285 

3.5 

161 

2 

0 

Class 12 

imiiiimmm 

5 

127 

3.0 

69 

3 

0 

Class 14 

20 

17 

324 

16.2 

139 

4 

4 

Class 18 

46 

5 

11 

4.0 

238 

1 

3 


Table 1 : High Risk Java Classes 
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The purpose of the above information is to identify the classes at highest risk for error. 
While there is insufficient data to make precise ranking determinations, there is enough 
information to justify additional testing of classes which exceed the recommended 
specifications. It is up to the project to detennine the criticality of these and the other 
classes to make the final determination on testing. Allocating testing resources based on 
these two factors, severity and likelihood of failures, amounts to risk-based testing. 


Object oriented software metrics can be used in combination to identify classes that are 
most likely to pose problems for a project. The SATC has used the data collected from 
thousands of object oriented classes to determine a set of benchmarks that are effective in 
identifying potential problems. When problematic classes are also identified by domain 
experts as critical to the success of the project, testing can be allocated to mitigate risk. 
Risk-based testing will allow developers to find and fix the most important software 
problems earlier in the test phase. 
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Child classes take advantage of reuse, but 
all will suffer if parent classes are too 
complex. 



Number of Methods per Class 



NOM measures both size and complexity of a class. It may be necessary to 
trade off some efficiency to preserve maintainability. 



Weighted Methods per Class 



WMC is defined here as the sum of the cyclomatic complexities 
of the methods implemented in one class. 



Response for Class 



If RFC is high, the codes complexity is increased and its 
understandability is decreased. 



Response for Class -5- by NOM 
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Experience tells us that this derived metric does a good job of finding 
classes that require extra testing. Java code tends to have higher RFC 
values. 





Coupling is a measure of inter-class complexity, a design issue. The larger 
the CBO, the more sensitivity to changes, maintenance is more difficult. 



Depth in Tree & Number of Children 



Deeply nested inheritance may hide complexity. The 
greater the NOC, the more likely the child classes will 
have improper abstraction. 
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Risk - Based Testing - Summary 
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Using Guided Inspection to Validate UML Models 

Melissa L. Major and John D. McGregor 
Software Architects 

Guided inspection is an inspection/review technique that is “guided” by test cases. Inspections are used to 
provide a detailed examination of a design or program by a human, as opposed to a machine’s execution of 
a prototype or completed application. However, even Fagan-style inspection processes focus more on the 
form of the inspection process rather than the substance of the material being inspected. Standard 
inspection techniques also focus on examining what is in the inspection material rather than determining 
whether there is something that is missing from the model or code. 

These standard inspections are often a top down reading of the code or a scan of a diagram. The top down 
approach makes the measurement of coverage straightforward but it is more difficult for the inspector to 
ensure that appropriate connections have been made between objects. The use of test cases means that the 
inspection process can address more than just the syntax of the diagram or code being reviewed. The test 
cases come from test plans that are already a required part of the software development process. 

Techniques such as checklists have been used to summarize the results of an inspection and to ensure that 
the inspector does a thorough job. Guided inspection supplements the checklist with the testing concept of 
“coverage”. Coverage measures determine how much of the product being inspected has been examined. 
Test cases are selected from the test plan so that, for example, every use case is represented by at least one 
test case. 

Studies have reported widely varying savings ratios for finding faults early in the development process as 
opposed to during the compilation or system test phases. For example, IBM reported that repairing a fault 
found at system test time may cost as much as 100 times the cost of repairing the same fault found during 
design. With this amount of margin even a technique that is relatively expensive can still result in time and 
cost savings. 

The Testing Perspective Applied to Inspections 

Applying a testing approach to inspections provides several benefits. In this section we examine these 
benefits and provide guidelines for the inspection process that ensure these benefits are realized. 

Objectivity 

For testing to be effective it must be conducted objectively. A person testing their own code is seldom 
sufficiently objective to achieve optimum results. If the person has made a wrong interpretation of the 
inputs to their process, that mistake will simply be carried forward into the test cases. A second person, or 
even an automated tool, will provide a different, although not always correct perspective. 

Guideline #1 Select inspectors from outside the immediate development team. 

Traceability 

For testing materials to be maintainable it must be easy to map changes in the model to needed changes in 
the test scenarios. In an iterative development environment changes occur frequently to all parts of the 
project. Changes in requirements are reflected in changes to the use cases. Changes in class specifications 
should signal the need for regression testing of the effected parts of the model. 

Guided inspection uses scenarios derived from the use case description as the primary test case description. 
A project should maintain a matrix that associates a package with all the use cases for which that package is 
needed. Then each time changes are made to a package, the affected use cases and scenarios are easily 
identified. 



Guideline #2 Maintain a mapping between use cases and the classes/packages that 
realize those use cases. 

Testability 

For testing to be possible, the model must be testable. This implies that the model is sufficiently specific to 
support the evaluation of test execution results. Domain models are general by design and there is a fine 
line between vague generality and sufficient detail to support testing. 

Guideline #3 Assign a team member to write test cases as the modeling proceeds. 

Have the “testing” domain expert review these. Feedback, into the modeling process, 
any information indicating places where the model is too vague. 

This is common advice that we give to process defmers at all levels. There should be a validation activity 
for each development phase. Preparation for that activity should proceed in parallel with the development 
activity. This allows the act of preparation to actually help improve the product before the formal 
validation. Writing test cases is an excellent technique for providing continuous feedback during 
development. 

Coverage 

For testing to provide us with confidence, we need to know how thoroughly the product under test has been 
examined. The general tenn for this type of metric is coverage. When we speak of “functional” testing, 
we mean that the coverage will be expressed in terms of the functional specification of the product under 
test. The metric is chosen to give some notion of completeness at the appropriate level. 

For guided inspection there are two different possible bases for coverage: the class/state/activity diagrams 
and use case diagrams. The use case diagram is a good source of scenarios; however, we are more 
concerned that the domain model contains a complete set of concepts for the domain. These are 
represented in the class diagram and further clarified in the state and activity diagrams for each class. 

Guideline # 4 - Use copies of the modeFs diagrams and mark off each element in a 
diagram as it is used in a test scenario. 

Developing a test scenario for each actor in the use case diagram is a minimal level of coverage. One 
scenario per primary use case is a stronger coverage criterion. Covering every primary use and then adding 
coverage for all “alternate courses of action” for use cases that are rated frequent and critical is an even 
stronger criterion. Once the set of scenarios are run through the model, the resulting coverage of the class 
diagram and state diagrams provides a check of the thoroughness with which the model has been inspected. 

Criteria for a Good Model 

The Guided Inspection evaluation criteria used by models are described more completely in [1]. They are: 

• correctness 

• completeness 

• consistency 

Correctness is a measure of how accurately the model represents the information. Correctness of the 
model is really the aggregate of judgements from the individual test cases. Each test case includes a 
description of the results expected from executing the test case. This expected result is based on a source 
that is assumed to be (nearly) infallible, a “test oracle”. The oracle usually is a human expert whose 
personal knowledge is judged to be sufficiently reliable to be used as a standard. The tester judges the 
accuracy of the model’s representation of concepts relative to the results expected by the oracle. 

A model is correct with respect to a test case if the result of the execution is the result that was expected. A 
model is correct if each of the test cases produces the expected results. The problem here is whether the 



“expected” result really is the appropriate one. In the real world, we must assume that the oracle can be 
incorrect on occasion. 


Completeness is a measure of whether a necessary, or at least useful, element is missing from the model. 
It is judged by determining if the entities in the model describe the information being modeled in sufficient 
detail for the goals of the current portion of the system being developed. This judgement is based on the 
model’s ability to represent the required situations and on the knowledge of experts. In an iterative 
incremental process, completeness is considered relative to how mature the current increment is expected to 
be. This criterion becomes more rigorous as the increment matures over successive iterations. 

One factor directly affecting the effectiveness of this criterion is the quality of the test coverage. The 
model is judged complete if the results of executing the test cases can be adequately represented using only 
the contents of the model. For example, a sequence diagram might be constructed to represent a scenario. 
All of the objects needed for the sequence diagram (SD) must come from classes in the class diagram or it 
will be judged incomplete. However, if only a few test cases are run, missing classes may escape detection. 
In most cases, this type of testing is sufficiently high level that coverage of 100% is achievable and 
desirable. 

Consistency is a measure of whether there are contradictions among the various diagrams within the model 
and between models produced during various phases. This may be partially judged by considering whether 
the relationships among the entities in the model allow a concept to be represented in more than one way. 
For example, each name should be unique. In an incremental approach the consistency is judged locally 
until this increment is integrated with the larger system. The integration process must ensure that the new 
piece does not introduce inconsistencies into the integrated model. 

Consistency checking can determine whether there are any contradictions or conflicts present either internal 
to a single diagram or between two diagrams. For example, one diagram, perhaps a sequence diagram, 
might require a relationship between two classes while another diagram, such as the class diagram, shows 
none. Inconsistencies will often initially appear as incorrect results in the context of one of the two 
diagrams and correct results in the other. Inconsistencies are identified by careful evaluation of the results 
of a simulated execution. 

A Basic Process 

Roles 

There are several roles in this process. Several roles may be assigned to a single person; however, to 
ensure objectivity there should be a clear distinction between the producers of the model under test (MUT) 
and the testers/inspectors. 

Test oracle - These personnel are the source of truth (or at least expected test results). 

They define the expected system response for a specific input scenario. These will 
usually be either domain experts or system engineers. 

Test case writer - These personnel perform the analysis necessary to select test cases. 

They also record the expected result for each test case as defined by the Oracle. These 
people may be developers who did not create the model or system test personnel. 

Symbolic executioner - These personnel provide the actual system response as defined 
to this point in the software development process. These will typically be members of the 
team developing the MUT since they understand the operation of the individual elements 
of the model. 

Moderator - The Moderator controls the session and advances the discussion through the 
scenario. 



Recorder - This person makes modifications to the reference models as the team agrees 
upon changes. The Recorder also makes certain that these changes are taken into 
consideration in the latter parts of the scenario. The Recorder also maintains a list of 
issues to record questions that are not resolved during the testing session. 

Drawer - This person constructs the SD as the scenario is executed. He/ she concentrates 
on capturing all of the appropriate details such as returns from messages and state 
changes. They may also annotate the SD with information between the message arrow 
and the return arrow. 


Steps 

The model testing process is tightly coupled with the model development process [2]. We have found it 
useful to iterate within the modeling process by periodically switching from the modeling activity to the 
testing activity. This provides quick feedback and often provides new information to be modeled. 

The basic steps are the same as for any testing process; 

• Analyze - Much of the testing analysis has been done if the use case descriptions 
contain sufficient infonnation to allow them to be prioritized. We use a weighted 
frequency profile to prioritize use cases for testing. The weight is based on how 
critical the use is to the success of the system. 

• Construct - Write scenarios from the use cases. Each scenario must be made more 
specific by providing exact values for attributes. These values are selected by first 
establishing equivalence classes of values. Equivalence classes of values are all values 
that will provide the same behavior in a given context. For example, {0, 1,2,3} all 
produce the same response from the statement x > -i and x<4. Many of the test 
parameters will be objects. Equivalence class translates to “objects that are in the same 
state no matter how they got there.” Different use cases will have different numbers of 
test cases. We select from some states more frequently than others due to their 
participation in high priority use cases. 

• Execute and Evaluate - The inspection process combines the application of a 
checklist with the execution of test cases. The test session involves role-playing in 
which the modelers and developers step through the model. The test session is a group 
meeting since no individual developer will understand all of the classes in the model 
and few models contain all the relevant information. The moderator selects one of the 
scenarios and triggers the use. The developer/ owner of the class that begins the 
scenario by describing the action taken by his/her object and describes its interaction 
with other objects. For each interaction, the owner of the class receiving the message 
describes their interaction with other objects and execution proceeds along each of 
these links. 


Summary of Our Experience 

The Guided Inspection technique has been used in a variety of forms on a number of projects that differ in 
size, complexity and domain. The technique has been used in the usual analysis and design contexts where 
a development organization applied the technique to each model produced by an increment team. It has 
also been used in limited engagements where a domain model or an architectural model was the only 
artifact being evaluated. Our experience and that of knowledgeable clients is that this technique has greater 
defect finding power than other widely used inspection techniques. The technique does require more effort 
(the constmction of test cases) than other inspection techniques but it is effort that would be expended 
anyway. 

The use of test cases brings a logical continuity to the inspection. Each step in the test case is a logical 
consequence (rather than a syntactic necessity) of the previous steps. This guides the inspectors through 
the material to be inspected in a path that allows them to judge the semantic validity of the model in 



addition to evaluating its syntactic correctness. The result is that the defects that are found have the 
potential of greater impact on the system than the syntactic bugs found in a sequential search. 

Conclusion 

We have presented an overview of guided inspection. This quality technique provides a means for 
examining models and code in a semantically meaningful way rather than examining disjoint pieces of 
syntax. Detecting defects in the early analysis and design models makes a major contribution to the quality 
of the application and to an on-time, on-budget delivery. Our presentation at the workshop will elaborate 
on the steps in the basic process and illustrate the models being inspected. 
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A Process Definition for 
Guided Inspection 

John D. McGregor 
Melissa L. Major 
Software Architects 

Goal: To identify defects in artifacts created 
during the analysis and design phases of 
software construction. 

Steps in the Process 

1 . Define the scope of the Guided Inspection 

2. Identify the basis model(s) from which the 
material being inspected was created 

3. Assemble the GI team 

4. Define a sampling plan and coverage criteria 

5. Create test cases from the bases 

6. Apply the checklist and tests to the material 

7. Gather and analyze test results 

8. Report and feedback 

Detailed Step Descriptions 

Define the scope of the Guided 
Inspection 

Inputs: 

The project’s position in the life cycle. 

The materials produced by the project (UML 
models, plans, use cases). 

Outputs: 

A specific set of diagrams and documents that 
will be the basis for the evaluation. 

Method: 

Define the scope of the GI to be the set of 
deliverables from a phase of the development 
process. Use the development process 
information to identify the deliverables that 
will be produced by the phase of interest. 

Example: 

The project has just completed the domain 
analysis phase. The development process 
defines the deliverable from this phase as a 
UML model containing domain-level use 
cases, static information such as class 
diagrams and dynamic information such as 
sequence and state diagrams. The GI will 
evaluate this model. 


Identify the basis model(s) from 
which the material being inspected 
was created 

Inputs: 

The scope of the GI. 

The project’s position in the life cycle. 

Outputs: 

The material from which the test cases will be 
constmcted (The Model Under Test - MUT) 

Method: 

Review the development process description 
to determine the inputs to the current phase. 
The basis model(s) should be listed as inputs 
to the current phase. 

Example: 

The inputs to the domain analysis phase is the 
“knowledge of experts familiar with the 
domain”. These mental models are the basis 
models for this GI. 

Assemble the GI team 

Inputs: 

The scope of the GI. 

Available personnel. 

Outputs: 

A set of participants and their roles. 

Method: 

Assign persons to fill one of three categories 
of roles: Administrative, Participant in creating 
the model to be tested. Objective observer of 
the model to be tested. Choose the objective 
observers from the customers of the model to 
be tested and the participants in the creation of 
the basis model. 

Example: 

Since the model to be tested is a domain 
analysis model and the basis model is the 
mental models of the domain experts, the 
objective observers can be selected from other 
domain experts and/or from application 
analysts. The creation participants are 
members of the domain modeling team. The 
administrative personnel can perhaps come 
from other interested parties or an office that 
provides support for the conduct of GIs. 



Define a sampling plan and 
coverage criteria 

Inputs: 

The project’s quality plan. 

Outputs: 

A plan for how test cases will be selected. 

A description of what parts of the MUT will 
be covered. 

Method: 

Identify important elements of this MUT. 
Estimate the required effort to involve all of 
these in the GI. If there are too many to cover, 
use information such as the RISK section of 
the use cases or the judgement of experts to 
prioritize the elements. 

Example: 

In a domain model there are static and 
dynamic models as well as use cases. At least 
one test case should be created for each use 
case. There should be sufficient test cases to 
take every “major” entity through all of its 
visible states. 


Create test cases from the bases 

Inputs: 

The sampling plan. 

MUT 

Outputs: 

A set of test cases. 

Method: 

Obtain a scenario from the basis model. 
Determine the pre-conditions and inputs that 
are required to place the system in the correct 
state and to begin the test. Present the scenario 
to the “oracle” to determine the results 
expected from the test scenario. Complete a 
test case description for each test case. 

Example: 

A different domain expert than the one who 
supported the model creation would be asked 
to supply scenarios that correspond to uses of 
the system. The experts also provide what 
they would consider an acceptable response. 


Apply the checklist and tests to the 
material 

Inputs: 

Set of test cases. 

Checklist for the type of model being 
inspected. 

MUT 

Outputs: 

Set of test results. 

Completed checklist. 

Method: 

Apply the test cases to the MUT using the 
most specific technique available. For UML 
models in a static environment, such as 
Rational Rose, an interactive simulation 
session in which the Creators play the roles of 
the model elements is the best approach. If the 
MUT is represented by an executable 
prototype then the test cases are mapped onto 
this system and executed. After the model has 
been thoroughly examined, complete the 
checklist. 

Example: 

The domain analysis model is a static UML 
model. A simulation session is conducted with 
the Observers feeding test cases to the 
Creators. The Creators provide details of how 
the test scenario would be processed through 
the model. Sequence diagrams are used to 
document the execution of each test case. Use 
agreed upon symbols or colors to mark each 
element that is touched by a test case. 

Gather and analyze test results & 
coverage 

Inputs: 

Test results in the form of sequence diagrams 
and pass/fail decisions. 

The marked-up model. 

Outputs: 

Statistics on percentage pass/fail. 
Categorization of the results. 

Defect catalogs and defect reports. 

A judgement of the quality of the MUT and 
the tests. 



Method: 

Begin by counting the number of test cases 
that passed and how many have failed. 
Compare this ratio to other GIs that have been 
conducted in the organization. Compute the 
percentage of each type of element that has 
been used in executing the test cases. Use the 
marked-up model as the source of this data. 
Update the defect inventory with information 
about the failures from this test session. 

Categorize the failed test cases. This can often 
be combined with the previous two tasks by 
marking paper copies of the model. Follow 
the sequence diagram for each failed test case 
and mark each message, class and attribute 
touched by a failed test case. 

Example: 

For the domain analysis model we should be 
able to report that every use case was the 
source of at least one test case, that every class 
in the class diagram was used at least once. 
Typically on the first pass, some significant 
states will be missed. This should be noted in 
the coverage analysis. 


Report and feedback 

Inputs: 

Test results. 

Coverage information. 

Outputs: 

Information on what new tests should be 
created. 

Test report. 

Method: 

Follow the standard format for a test report in 
your organization to document the test results 
and the analyses of those results. If the stated 
coverage goals are met then the process is 
complete. If not, use that report to return to 
step 5 and proceed through the steps to 
improve the coverage level. 

Example: 

For the domain analysis tests, some elements 
were found to be missing from the model. The 
failing tests might be executed again after the 
model has been modified. 


Roles in the Process 
Administrator 

The administrative tasks include running the GI 
sessions, collecting and disseminating the 
results, and aggregating metrics to measure the 
quality of the review. In our example, personnel 
from a central office could do the administrative 
work. 


Creator 

The persons who created the MUT are the 
creators. Depending upon the form that the 
model takes, these people may “execute” the 
symbolic model on the test cases or they may 
assist in translating the test cases into a form that 
can be executed with whatever representation of 
the model is available. In our example the 
modelers who created the domain model would 
be the “creators”. 


Observer 

Persons in this role create the test cases that are 
used in the GI. In our example they would be 
domain experts and preferably experts who were 
not the source of the information used to create 
the model initially. 
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can be generated from system use cases. 
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for each variable. 
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still result in a net savings over the full project life cycle. 




Please keep in touch if there is anything I can do for you. 
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ABSTRACT 

Inspections can be used to identify defects in software artifacts. In this way, inspection 
methods help to improve software quality, especially when used early in software 
development. Inspections of software design may be especially crucial since design 
defects (problems of correctness and completeness with respect to the requirements, 
internal consistency, or other quality attributes) can directly affect the quality of, and 
effort required for, the implementation. We have created a set of “reading techniques” 
(so called because they help a reviewer to “read” a design artifact for the purpose of 
finding relevant information) that gives specific and practical guidance for identifying 
defects in Object-Oriented designs. Each reading technique in the family focuses the 
reviewer on some aspect of the design, with the goal that an inspection team applying 
the entire family should achieve a high degree of coverage of the design defects. 

In this paper, we present an overview of this new set of reading techniques. We discuss 
the reading process and how readers can use these techniques to detect defects in high 
level object oriented design UML diagrams. 


Keywords: 00 Design, Reading Techniques, Software Quality, and Software Inspection 


1. Introduction 

A software inspection aims to guarantee that a particular software artifact is complete, consistent, 
unambiguous, and correct enough to effectively support further system development. For 
instance, inspections have been used to improve the quality of a system’s design and code 
[Fagan76]. Typically, inspections require individuals to review a particular artifact, then meet as 
a team to discuss and record defects, which are then sent to the document’s author to be 
corrected. Most publications concerning software inspections have concentrated on improving the 
inspection meetings while assuming that individual reviewers are able to effectively detect 
defects in software documents on their own (e.g. [Fagan86, Gilb93]). However, empirical 
evidence has questioned the importance of team meetings by showing that meetings do not 
contribute to finding a significant number of new defects that were not already found by 
individual reviewers [Votta93, Porter95], 



“Software reading techniques” attempt to increase the effectiveness of inspections by providing 
procedural guidelines that can be used by individual reviewers to examine (or “read”) a given 
software artifact and identify defects. These techniques consist of a concrete procedure given to 
a reader on what information in the document to look for. Another important component of the 
techniques are the questions that explicitly ask the reader to think about the information just 
uncovered in order to find defects. In previous work, we have developed families of reading 
techniques [Basili96]. There is empirical evidence that software reading is a promising technique 
for increasing the effectiveness of inspections on different types of software artifacts, not just 
limited to source code [Porter95, Basili96, Basili96b, Fusaro97, Shull98, Zhang98]. In this work, 
we concentrate specifically on inspections, for the purpose of defect detection, of high-level 
Object-Oriented (00) designs diagrams represented using UML [Fowller97]. (UML is a 
notational approach that does not define how to organize development tasks.) Figure 1 organizes 
the “problem space” to which reading techniques can be applied, and illustrates how reading 
techniques for this task (known as Traceability-Based Reading) fit with previous work. Families 
of reading techniques have been tailored to defect inspections of requirements (for requirements 
expressed in English or SCR, a foimal notation) and to usability inspections of user interfaces. 
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Figure 1 -Families of 00 Reading Techniques 

Section 2 briefly describes object oriented design in terms of the information that is important to 
be checked during software inspections. Section 3 introduces the reading techniques, showing the 
different types of defects such techniques are intended to identify and an outline of the whole set 
of techniques. The fourth section discusses how the techniques can be used for inspecting 00 
designs. Finally, some suggestions for future work are discussed in the conclusions. 

2. Object Oriented Designs in UML 

An 00 design is a set of diagrams concerned with the representation of real world concepts as a 
collection of discrete objects that incorporate both data structure and behavior. Normally, high- 
level design activities start after the software product requirements are captured. So, concepts 
must be extracted from the requirements and described using the paradigm constracts. This 



means that requirements and design documents are built at different times, using a different 
viewpoint and abstraction level. When high-level design activities are finished, the documents, 
basically a set of well-related diagrams, can be inspected to verify whether they are consistent 
among themselves and if the requirements were correctly and completely captured. High-level 
design activities deal with the problem description but do not consider the constraints regarding 
it. That is, these activities are concerned with taking the functional requirements and mapping 
them to a new notation or form, using the paradigm constmcts to represent the system via design 
diagrams instead of just a textual description. Such an approach allows developers to understand 
the problem rather than to try to solve it. 

Low-level design activities deal with the possible solutions for the problem; they depend on the 
results from the high-level activities and nonfunctional requirements, and they serve as a model 
for the code. Our interest is to define reading techniques that could be applied on high-level 
design documents. We feel that reviews of high-level designs may be especially valuable since 
they help to ensure that developers have adequately understood the problem before defining the 
solution. Since low-level designs use the same basic diagram set as the high-level design, but 
using more detail, reviews of this kind can help ensure that low-level design starts from a high- 
quality base. 

More specifically, the reading techniques investigated in this work are tailored to inspections of 
documents using UML notation. UML diagrams capture the static and dynamic view of the real 
world as described by the object-oriented constructs. We focused our reading techniques on the 
following high-level design diagrams: class, interaction (sequence and collaboration), state 
machine and package. Usually, these are the main UML diagrams that developers build for high- 
level 00 design. They capture the static and dynamic views of the problem, and even allow the 
teamwork to be organized, based on packaging information. The design content needs to be 
compared against the requirements, which can likewise be described using a number of separate 
diagrams to capture different aspects. In particular, we expect that there will be a textual 
description of the functional requirements that may also describe certain behaviors using more 
specialized representations such as use-cases [Jacobson95]. 

Thus, we identify the following as important sources of information for ensuring the quality of a 
UML high level design: 

• A set of functional requirements that describes the concepts and services that are necessary in 
the final system; 

• Use cases that describe important concepts of the system (which may eventually be 
represented as objects, classes, or attributes) and the services it provides; 

• A class diagram (possibly divided into packages) that describes the classes of a system and 
how they are associated; 

• A set of class descriptions that lists the classes of a system along with their attributes and 
behaviors; 

• Sequence diagrams that describe the classes, objects, and possibly actors of a system and how 
they collaborate to capture services of the system; 

• State diagrams that describe the internal states in which a particular object may exist, and the 
possible transitions between those states. 



3. Reading Techniques for high-level design 


Each reading technique can be thought of as a set of procedural guidelines that reviewers can 
follow, step-by-step, to examine a set of diagrams and detect defects. The types of defects on 
which our techniques are focused, as listed in Table 1, are based on earlier work with 
requirements inspections. The defect taxonomy is important since it helps focus the kinds of 
questions reviewers should answer during an inspection. 


Type of Defect 

Description 

Omission 

One or more design diagrams that should contain some concept from 
the general requirements or from the requirements document do not 
contain a representation for that concept. 

Incorrect Fact 

A design diagram contains a misrepresentation of a concept described 
in the general requirements or requirements document. 

Inconsistency 

A representation of a concept in one design diagram disagrees with a 
representation of the same concept in either the same or another 
design diagram. 

Ambiguity 

A representation of a concept in the design is unclear, and could cause 
a user of the document (developer, low-level designer, etc.) to 
misinterpret or misunderstand the meaning of the concept. 

Extraneous 

Information 

The design includes information that, while perhaps true, does not 
apply to this domain and should not be included in the design. 


Table 1 - Types of software defects, and their specific definitions for 00 designs 


We defined one reading technique for each pair or group of diagrams that could usefully be 
compared against each other. For example, use cases needed to be compared to interaction 
diagrams to detect whether the functionality described by the use case was captured and all the 
concepts and expected behaviors regarding this functionality were represented. The full set of our 
reading techniques is defined as illustrated in Figure 2, which differentiates horizontal^ 
(comparisons of documents within a single lifecycle phase) from vertical^ (comparisons of 
documents between phases) reading. 



Figure 2 - Set of 00 Reading Techniques 


^ Consistency among documents is the most important feature here. 

^ Traceability between the phases is the most important feature here. 






















Initial validation of these techniques was accomplished by means of a study [Shull99, 
Travassos99] that provided evidence for the feasibility of these techniques. Using the techniques 
did allow teams to detect defects, and in general subjects agreed that the techniques were helpful. 
Also, the vertical techniques tended to find more defects of omitted and incorrect functionality, 
while the horizontal techniques tended to find more defects of ambiguities and inconsistencies 
between design documents, lending some credence to the idea that the distinction between 
horizontal and vertical techniques is real and useful [Travassos99]. 

Further studies have been undertaken to improve the practical applicability of the techniques. As 
a result of specific feedback from the feasibility study, we developed a second version of the 
techniques and studied them using an observational approach (i.e., using experimental methods 
suitable for understanding the process by which subjects apply the techniques) [Travassos99b]. 
The feasibility study had identified global issues for improvement, that is, issues that affected the 
entire process, such as the amount of semantic versus syntactic checking. The observational 
approach was necessary to understand what improvements might be necessary at the level of 
individual steps, for example, whether subjects experience difficulties or misunderstandings 
while applying the technique (and how these problems may be corrected), whether each step of 
the technique contributes to achieving the overall goal, and whether the steps of the technique 
should be reordered to better coixespond to subjects’ own working styles. Detailed information 
about the results and also an improved version of the techniques can be found in [Shull99b]. 

4. Using OO Reading Techniques for inspecting OO Design 

In this section we explore the application of the reading techniques in an inspection process. 
While horizontal reading aims to identify whether all of the design artifacts are describing the 
same system, vertical reading tries to verify whether those design artifacts represent the right 
system, which is described by the requirements and use-cases. So, the goal is that when all the 
techniques are used together, then all the quality issues in the design are covered. The 
development team can use the whole set of the techniques, but if some design artifacts do not 
exist, there is no impact on the design inspection process. A subset or reordering of the 
techniques may also be chosen based on important attributes of the design to be reviewed. This is 
particularly interesting when developers are dealing with specialized application domains. For 
example, consider a system whose functionality is based mainly on its reaction to stimuli where 
state machine diagrams are common. In this situation, it could be beneficial to use the reading 
techniques that focus on state machine diagrams before using the reading techniques that focus 
on the other design diagrams. For conventional systems, such as database systems, the semantic 
model of the information and the flow of the transactions seem to be the important information. 
Therefore, a subset of the techniques could be picked that focus on this information. In this 
situation, first reading the class diagram against the sequence diagrams seems to be a good idea 
then continuing with the rest of the techniques. 



To organize the reading process, reading responsibilities can de distributed among the members 
of the inspection team, reducing the reading effort per team member and improving the reading 
process. In this way, each one of the readers can apply a reduced number of reading techniques, 
or even deal with a reduced number of artifacts at the same time. After individual review, it is 
important to organize a meeting in order to review each one of the individual defect lists and to 
create a final list that reflected a group consensus of the defects in the documents. It is not 
necessary to apply the techniques in a particular order, but it seems to be reasonable to apply first 
horizontal reading for all existing design artifacts and then vertical reading, to ensure that a 
consistent system description is checked against the requirements. In Figure 3 is an example of 
how the techniques could be organized among a team of three reviewers. 
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Figure 3 - Organizing reading with 3 readers 

To support these two types of reading (horizontal and vertical) we have introduced some new 
terminology to describe the actions of the system. First, because the level of abstraction and 
granularity of the information in the requirements and use-cases is different from the abstraction 
and information in the design artifacts, the concept of system functionality was broken down into 
three complementary concepts (messages, services, and functionality). Messages are the very 
lowest-level behaviors out of which system services and, in turn, functionalities are composed. 
They represent the communication between objects that work together to implement system 
behavior. Messages may be shown on sequence diagrams and must be associated with class 
behaviors. Services are combinations of one or more messages and usually capture some basic 
activity necessary to accomplish a functionality. They can be considered low-level actions 
performed by the system. They are the “atomic units” out of which system functionalities are 
composed. A service could be used as a part of one or more functionalities. We use the term 
“functionality” to describe the behavior of the system from the user’s point of view, in other 
words, the functionality that the user expects to be visible. A functionality is composed of one or 
more services. Users do not typically consider services an end in themselves; rather, services are 
the steps by which some larger goal or functionality is achieved. 

A second important piece of terminology is that of conditions and constraints. A condition 
describes what must be true for the functionality to be executed. A constraint must always be true 
for system functionality. This information is important to readers comparing different diagrams 








since it describes how the functionality must be implemented; this information is important to 
maintain with the functionality it describes. 


B.2 Reading 2 — State diagrams x Class description 

Goal: To verify that the classes are defined in a way that can capture the functionality specified by the 
state diagrams. 

Inputs to Process: A set of class descriptions that lists the classes of a system along with their attributes and 
behaviors and a state diagram that describes the internal states in which an object may exist, and the possible 
transitions between states. 

For each state diagram, perform the following steps: 

1) Read the state diagram to understand the possible states of the object and the actions that trigger 
transitions between them. 

2) Find the class or class hierarchy, attributes, and behaviors on the class description that 
correspond to the concepts on the state diagram. 

3) Compare the class diagram to the state diagram to make sure that the class, as described, 
can capture the appropriate functionality. 

Using your semantic knowledge of this class and the behaviors it should encapsulate, are all states 
described? If not, you have uncovered a defect of incorrect fact, that is, the class as described 
cannot behave as it should. 

Is there some unstarred state? Could you evaluate the importance of this state? Does it really 
describe an essential object state? Is the state feasible considering all actions and constraints 
surrounding it? If yes, probably something is missing on the class diagram and there is an 
inconsistency between the diagrams. Otherwise, an extraneous fact should be reported. 

Is there some unstarred event? If yes, fill in a defect record showing the inconsistency between the 
class description and state diagram. 

Is there some unstarred constraint? Is the constraint directly concerned with some object data? If 
yes, fill in a defect record showing the information that has been omitted from the class 
description. 

Figure 4 - An excerpt of a Horizontal Reading 


The main idea in applying horizontal reading is to understand whether all the high level design 
artifacts are representing the same system. We must keep in mind that the artifacts should model 
the same system infomiation but from different perspectives. UML organizes the artifacts and 
different types of information based on the type of system information they contain. There are 
specific artifacts to capture essentially static information (basically, the stracture assumed by the 
domain’s objects while playing specific roles in the problem domain) and specific artifacts to 
capture essentially dynamic information (basically, the consequences when objects are asked to 
behave in order to accomplish system functionalities). These different views are useful and 
together allow developers to understand what is going on with the objects and how they are 
accomplishing the required functionalities in the context of the problem. However, these 
differences among the diagrams make the inspection process a bit more complicated. For 
instance, when comparing sequence diagrams against state machine diagrams two different 
perspectives must be combined to interpret and identify possible defects. Each one of the 
sequence diagrams is a represents some system objects and the messages exchanged between 
them that implement some functionality required by the user while, on the other hand, the state 
machine diagram is a picture of what happens to one object when it is influenced by the events 
occurring in multiple sequence diagrams. Sequence diagrams show the specific messages 
exchanged by objects, while state diagrams show how the system responds to events, which can 
be messages, services, or functionality. Both diagrams must convey information about conditions 





and constraints on the functionality. So, the horizontal reading techniques explore these types of 
differences and help reduce the semantic gap between the documents. Figure 4 shows an excerpt 
from a horizontal reading technique highlighting the concerns for each one of the reading steps 
(some details are omitted). 


B.7 Reading 7 — State Diagrams x Requirements Description and Use-cases 

Goal: To verify that the state diagrams describe appropriate states of objects and events that trigger 
state changes as described by the requirements and use cases. 

Inputs to process: The set of all state diagrams, each of which describes an object in the system. A set of 
functional requirements that describes the concepts and services that are necessary in the final system and the set 
of use cases that describe the important concepts of the system 

For each state diagram, do the following steps: 

1) Read the state diagram to basically understand the object it is modeling. 

2) Read the requirements description to determine the possible states of the object, which states 
are adjacent to each other, and events that cause the state changes. 

3) Read the Use cases and determine the events that can cause state changes. 

4) Read the state diagram to determine if the states described are consistent with the 
requirements and if the transitions are consistent with the reqnirements and use cases. 

Were you able to find all of the states? 

If a state is missing, look to see if two or more states that you marked in the requirements were 
combined into one state on the state diagram. If not, then you have found a defect of Omission. If so, then 
does this combination make sense? If not, you have found a defect of Incorrect Fact. 

Were there extra states in the state diagram? 

Look to see if one state that you marked in the requirements has been split into two or more states in 
the state diagram. If not, then you have found a defect of Extraneous. If so, does this split make sense? If 
not, you have found a defect of Incorrect Fact. 

Do all of the events on the adjacency matrix appear on the state diagram? If not, you have found a 
defect of omission. Do events appear on the state diagram that are not on the adjacency matrix? If so, 
you have found a defect of extraneous fact. 

Did you find all of the constraints that are on the adjacency matrix? If not, then you have found a 
defect of omission. Did you find a constraint on the state diagram that is not on the adjacency matrix? If 
so, does the constraint make sense? If not then you have found a defect of extraneous fact. 

Figure 5 - An excerpt of a Vertical Reading 


To apply vertical reading readers should be aware of the differences between the two lifecycle 
phases in which the documents were created and how the traceability between these two different 
phases could be explored. The levels of abstraction and information representation between these 
phases are quite different. Requirements and use cases should precisely describe the problem and 
thus use a totally different representation than the design artifacts. Moreover, usually the entire 
problem definition is presented using these two types of document. There is no separation of 
concerns and no direct mapping from one phase (specification) to another (design). Vertical 
reading techniques explore such ideas and provide some guidance to help the reader identify the 
information s/he needs. For example, the requirements descriptions and use cases capture the 
functionality of the entire system and in some cases the services, but not the messages. Designers 
using these requirements and use cases decide about the messages based on the viewpoint 
(abstraction) used to classify and organize the classes. Sequence diagrams are organized based on 
messages that work together in some way to provide the services, which compose the required 
functionality. Requirements and use cases describe constraints and conditions in general terms; 





on a sequence diagram such information must be made explicit and associated with the 
appropriate messages. So, vertical reading techniques explore these types of differences by 
defining some guidelines for tracing the right information between these two lifecycle phases. 
Figure 5 shows an excerpt from a vertical reading technique highlighting the concerns for each 
one of the reading steps (some details are omitted). 

A full description of the entire set of techniques, including the ones referred to here, can be found 
in [Shull99b], which is accessible via the web. 

5. Ongoing Work 

The Object Oriented reading techniques (OORTs) have been, and still are, evolving since their 
first definition. New issues and improvements have been included based on the feedback of 
readers and volunteers. Thi'oughout this process, we have been trying to capture new features and 
to rmderstand whether the latest version of the reading techniques keeps its feasibility and 
interest. We have found observational techniques useful, because they have allowed us to follow 
the reading process as it occuixed, rather than trying to interpret the readers’ post-hoc answers as 
we have done in the past. Obsei'ving how readers normally try to read diagrams challenged many 
of om assumptions about how our techniques were actually being applied. 

However, two important questions remain open in this area. First, the role of domain knowledge 
is not yet well understood for these two sets of reading techniques, especially for horizontal 
reading. Since horizontal reading is a largely syntactic check of consistency between two design 
diagrams, it is not expected to require domain knowledge. Still, it has been observed that a reader 
possessing some knowledge about the problem domain seemed to be more effective than a reader 
who does not have the same level of knowledge. Some empirical investigation into exactly how 
domain knowledge plays a role in this type of reading could help us better understand and thus 
better support the process. The second question regards the level of automated support that 
should be provided for such techniques. The observational studies have allowed us to understand 
which steps of the techniques can feel especially repetitive and mechanical to the reader. So, the 
clerical activities regarding the reading process using OORTs must be precisely defined and 
identified. For this situation, further observational studies play an important role and they should 
be executed aiming to collect suggestions on how to automate the clerical activities concerned 
with OORTs. 

Currently, the techniques are undergoing experimental evaluation, which is aimed at evolving 
them, hi each experiment we explore a different issue regarding the techniques in order to evolve 
them or understand them at a deeper level. This series of experiments is an evolutionary process. 
The feedback from the readers and the observation of the techniques usage are playing an 
important role as we work towards a useful and feasible set of reading techniques for 00 design. 
The results of these experiments will be published in future publications, which will be available 
at http://www.cs.umd.edu/projects/SoftEng/ESEG. 
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Abstract 

There exist a number of different approaches, often called 
framemrks, supporting softimtv process iff/propemefi/ (SPI). 
Their differences and similarities has been the subject of 
some debate. This paper discusses four different classes of 
methods, which can be used to compare SPI frameworks. 
One of these methods is a new taxonomy proposed in this 
paper. 

1. Introduction 

Focus on softtrare process iffcpropement (SPI) is growing. The 
underlying assumption of SPI is that product quality is 
influenced by the quality of the process used to produce it: 

Qualitj(/'lw«rj) => QuaMt^ProSucI) 

This causal relation may seem trivial at first, but in reality 
there are numerous variations in the approach to SPI. 
These approaches are often called SPI framemrks and they 
generally describe how organizations can assess current 
process quality, as well as how they can improve it. Most 
frameworks are rather comprehensive and differences in 
content are evident in a number of aspects, e.g. focus, 
goals, adaptability and so on. There are even subtle 
differences in their interpretation of words like qualfr and 
process. 

However, the SPI framework differences may not be 
apparent at first, and because the frameworks are so 
comprehensive, it is costly to investigate them aU. The 
result is that the differences, which set one framework 
apart from another, are not clear. Evidently, systematic 
methods to compare the frameworks are needed. The 
question is how this can be done fricientfr o/yectipeJy and in 
a way that is possible fo pa/idafe. 

1.1 Why Compare SPI Frameworks? 

Comparing SPI frameworks can be rewarding from an 
academic view. However, focus should not be on the 
frameworks themselves, but on real improvements 
resulting from their adoption. SPI framework comparisons 
should therefore provide practical insight and guidance 
when selecting which framework to employ in a software- 
producing organization. It should be clear that no single 
“right” comparison method exists for this purpose, and a 
combination of methods may be necessaty depending on 
the context. The primaty usability requirements to be 
considered are: 


• Knoirledge-Iepel — The amount of detail in the 
comparison should correspond to the knowledge- 
level of the user. 

• Point of pieuf — She. comparison method can be general 
or take the standpoint of a specific framework and 
view others in terms of that. 

How these requirements are satisfied depends on the 
reason for comparing the SPI frameworks. An 
organization mthout prior SPI knowledge may wish to 
institutionalize improvement work because of competitive 
pressure or certification requirements - but which 
framework is appropriate? On the other hand, an 
organization mth an SPI framework in place may wish to 
adopt more than one approach — but how can this be 
done with the least amount of redundancy? In the latter 
case working knowledge about one specific framework 
exists, but knowledge about other approaches may not be 
as thorough. 

2. Comparison Methods 

There is an increasing amount of literature comparing the 
major SPI frameworks. Most is written in the last three 
years and generally cover only a small number of 
frameworks, e.g. [1][2][3]. 

From our review of other comparison work we have 
recognized four main classes of comparison methods. 
These will be described shortly in the following 
subsections. 

2 . 1 Characteristics Comparison Method 

A comparison method well suited for a general overview is 
the use of characteristics. The characteristics can be nominal, 
ordinal or absolute and should preferably be objective, 
measurable and comparable. However, the main point is 
that they represent areas of interest for the SPI framework 
investigation. 

The frameworks are compared in terms of the defined 
characteristics and the results can be presented in a tabular 
format. This gives us a compact and high-level 
comparison method with little details. Such details must be 
collected elsewhere, e.g. using another comparison 
method. 

The taxonomy we propose in section 3 is based on the 
characteristics comparison method. 



2.2 Framework Mapping Comparison Method 

Framemrk mapping is the process of creating a map from 
statements or concepts of one framework to those of 
another. This requires that the actual frameworks are 
rather formalized, i.e. consist of a more or less defined set 
of statements or requirements. 

In the characteristics method the goal was to describe 
important attributes of each SPI framework, i.e. areas of 
interest. However, the purpose of mapping is to identify 
overlaps and correlation between frameworks and create a 
map of these. There can exist strong, weak or no 
correlation as suggested by Tingey [3]. Furthermore, the 
mapping can be done on either a high or a low level 
depending on the amount of detail included. In either case, 
it is more low-level than characteristics and thus not ver\r 
useful for a general overview. 

Framework mapping is especially useful when an 
organization employs two or more different SPI 
frameworks, as corresponding statements can be identified 
and redundancy reduced. Thus the extra effort needed to 
employ more than one framework is minimized. 

2.3 Biiaterai Comparison Method 

In a bilateral con/parison two frameworks are compared 
textuaUy. The difference between this comparison method 
and the two previous ones is its textual nature. A bilateral 
comparison is often a summan^ or explanation of findings 
from other the comparison methods. 

The bilateral comparison can take on the point of view of 
one framework and describe another in terms of it. This is 
convenient for people with detailed knowledge of one 
framework, because they can easily get insight into another 
using familiar terms. 

The amount of detail included in a bilateral comparison 
can vary widely, depending on the purpose for which it is 
written. Frequently the level of detail is somewhere in 
between that of the characteristics and the mapping 
approaches. 

2.4 Needs Mapping Comparison Method 

Needs mapping is not a direct comparison between 
frameworks. Instead, it considers organizational and 
environmental needs that must be considered when 
selecting which SPI framework to adopt. The 
requirements imposed by such needs are often highly 
demanding and can limit the choice of framework 
severely. Nonetheless, they are of utmost importance and 
must be considered carefully. Here are some examples; 

• Certification requirements, for example to ISO 9001, 
often imposed on a subcontractor. 


• Top-level management requires that the chosen SPI 
approach should be incorporated in a Total Quality 
Management (TQM) strategtt 

• Financial limitations. 

There certainly exist other examples as well, and they can 
vatyr substantially from organization to organization, or 
depend on the business environment. Furthermore, the 
needs may vaty over time as the organization or 
environment evolves. 

3. The Proposed Taxonomy 

We present a list of 25 characteristics, i.e. areas of interest, 
relevant for discussing differences between SPI 
frameworks. Because there are so many characteristics, 
they have been grouped in 5 categories to enhance 
comprehensibility and readability (cf. Figure 1). 

3 .^ Generai Category 

This categoty describes general attributes or features of 
SPI frameworks, frequently related to how they are 
constructed or designed: 

• Geographie origin/ spread — Where did the framework 
originate and where is it used today? 

• Sdentijk origin — The scientific background on which 
the framework is based, e.g. another SPI framework. 

• Depelopment/ stabilip' — It is desirable to employ an 
evolved and relatively stable framework. This is 
achieved through experience feedback from real use 
over a number of years. 

• Popularip' — A popular framework tends to receive 
better support and further development than an 
unpopular framework. 

• Sof/tnare spea^k— Some frameworks are especially 
geared towards software engineering, others are more 
general and must be adapted. 

• IFescriptive/ desap tive -Vtt'iZ'dpXssft frameworks 
prescribe mandatoty requirements/processes. 
Descriptive frameworks describe a state or certain 
expectations to be met without assigning specific 
actions to be taken. 

• Adcptabi/ity— The degree of flexibility in the 
framework, e.g. does it support tailoring and 
customization for specific uses? 

3.2 Process Category 

The process categoty concerns characteristics that describe 
how the SPI framework is used: 

• Assessment— Is an assessment scheme part of the 
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framework and if so, what is assessed? 

• Assessor— The assessment can be carried out internally 
by the organization itself or by an external group. 

• Process mpropemenf me/hoei — What kind of guidelines are 
included to help implementation and institutionaliz- 
ation of process improvement? 

• Impropement imPafwn — Where in the organization is the 
improvement work initiated, e.g. top-down or 
bottom-up? 

• h/cpropemen/ focus — SPI activities regarded as the 

most important by the framework. 

• Anafysis /echniques — Does the framework utilize any 
quantitative or qualitative analysis techniques, e.g. 
statistical process control or questionnaires? 

3.3 Organization Category 

The characteristics in this categon^ are directly related to 
attributes of the organization and environment in which 
the SPI framework is used; 

• Actors/ roles/ stakeholders— ^\xo are the primart' people, 
groups and organizations affected by the 
improvement process and what roles do they hold in 
this process? 

• Orgamqatton sfe — The framework may be more or less 
suitable for an organization of a certain size, e.g. 
depending on the required and available resources. 

• Coherence— \% there a logical connection between 
engineering factors and factors related to the business 
or organization[l]? Coherence can exist internally in 
the organization or externally between the 
organization and its environment. 

3.4 Quality Category 

Characteristics in this categor\r are related to the quality 
dimension of the frameworks; 

• tQuallf perspectlpe — The concept of good quality 
depends on whom you ask, e.g. management, 
customers or employees. 

• Progression — Does the framework measure quality 
progression in a flat, staged or continuous manner? 

• Causal relation — How does the framework measure an 
improvement in quality, i.e. what factors are assumed 
to influence quality? 

• Coucparatlpe — Can the framework be used to compare 
different organizational units, either internally or 
externally? If so, which aspects are compared? 

3.5 Result Category 

The term resultis loosely used in this categotyr, meaning the 
outcome originating from the SPI framework adoption; 

• Goal— The primaty objective or end result of using 
the framework. 

• Process attfacts — The artifacts created in addition to 
the actual product as a result of adopting the 
framework. 


• Certification — Does the framework include an 
assessment leading to certification according to ISO 
or a national standard body? 

• Cost (f Implementation — Are there any estimates on how 
much an adoption and implementation of the 
framework will cost? 

• Validation — What kind of validation efforts have been 
made to evaluate what improvements the framework 
leads to? Such validation should exclude external 
success factors, as they would have been achieved 
even if the SPI framework was not adopted. 

4. Conclusion 

The goal of comparing SPI frameworks is to provide 
practical insight and guidance when selecting which SPI 
framework to adopt in a software-producing organization. 
Such guidance is needed because of the multitude, 
diversity and comprehensiveness of existing frameworks. 
A natural question is whether those SPI efforts that report 
only a limited degree of success, have adopted the wrong 
frameworks. 

When learning about SPI frameworks it may be necessan' 
to use a combination of comparison methods, preferably 
starting on a high level. The most interesting frameworks 
can then be chosen for further investigation, eliminating 
the cosdy task to examine all of them. 

We believe that our proposed taxonomy is a suitable 
starting point for such investigations because it describes 
the most important areas of interest. A major strength of 
the taxonomy is its compactness, yet it retains the 
descriptive power of more elaborate comparison methods. 
However to comprehend the taxonomy fully, some 
general SPI knowledge is required. There should be no 
problem collecting material for further investigation, since 
literature on the various frameworks is vast. 
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Abstract. We often hear that it is difficult to get software measurement into practice. At least one 
important reason for this is that traditional software measurement is not aligned with the strategic 
objectives of the organization. When software measurement is aligned with an organization's 
market discipline then the implementation is accelerated. 


One of the reasons it is difficult to get measurement implemented is that it is unaligned with 
organizational objectives. Measurement is traditionally used to increase quality, increase 
programmer productivity, and reduce costs. Oddly enough, these are not the highest priority 
objectives for a number of organizations, so therefore traditional measurement is difficult to 
implement in them. 

The Discipline of Market Leadership is a survey of how 80 organizations out-achieved their 
competitors. The authors found that focusing on one of three market areas was the answer: 
operational excellence, customer intimacy, and product innovativeness. Operationally excellent 
organizations have a "formula" for their service or product. Their menu of choices is small, limited, 
and with that menu they deliver excellently. Standard examples are McDonalds and Federal 
Express. 

Customer intimate organizations seek quite a different market niche, namely a total solution. 
Whatever the customer wants gets added to the menu. The menu is long and custom-made for each 
engagement. Financial service institutions might call customer intimacy a way of getting a greater 
share of the customer's wallet, there are few spending alternatives outside of the services offered: 
bank and savings accounts, certificates of deposit, credit and debit cards, travel arrangements, etc. 

Product innovative organizations pride themselves on maximizing the number of turns they get in 
the market. They introduce many new products, selling innovation and features as opposed to, say, 
price. Examples are Intel, 3M, Sony, and Bell Labs. They measure their success by the number of 
new product introductions, the number of patents, and/or the number of Nobel prizes. 

The authors of The Discipline of Market Leaders are quick to point out that all organizations have 
to have at least threshold characteristics of all three disciplines, but they have to focus on and excel 
at only one. One example of lop-sidedness cited was IBM's legendary customer intimacy being 
out-weighed by its inattention to price (that is, operational excellence), so competitors that were 
not as strong in customer intimacy could gain in-roads to IBM customers with price. 

Measurement of the type we are used to, the type espoused by the Software Engineering Institute 
and Quantitative Software Management, applies almost exclusively to organizations wishing to be 
operationally excellent. We typically have nothing to offer to customer intimate and product 
innovative firms in our measurement or improvement methods. 

Many software development organizations do not strive to become operationally excellent, so we 
have left them in the lurch, though we tend to treat them as resisters and of bad character! In fact, it 
is nothing more than a mismatch of goals. There is, for example, a large set of software 
development organizations that strive for customer intimacy and essentially will do anything their 
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clients request. Those organizations get to know their clients very, very well, sometimes better than 
the clients knows itself. An example of this might be a payroll service that has seen every variation 
on payroll and knows more about payroll processing than any in-house payroll department could. 
The most customer intimate payroll service offeror would take over their customers' payroll 
departments! 

What do you think Microsoft's market discipline is? I th ink it is product innovativeness. It touts its 
new, glitzy features, not its up-time or reliability. It wants to own/eam its clients based on new 
features, not offering software that is operationally excellent. In that context, the Software 
Engineering Institute's Capability Maturity Model for Software is silent on product innovativeness 
and customer intimacy: it applies only to organizations wanting to be operationally excellent. Same 
for traditional measurement. 

What are we missing in all of this? A more global view, one that listens to and responds to our 
measurement customers. We need to see that the potential rejection of our measurement efforts is 
NOT an indicator of bad character or resistance, ta may be an appropriate response to measures 
that do not fit the strategy. We need to joint problem-solve with our clients to develop new classes 
of measures that simultaneously meet our hi^ standards for objectiveness and their high standards 
for relevance. 

Examples. Let me relate several efforts in which I have participated: 

1 . One brokerage house was not interested in software costs or quality, but rather what it called 
time to market. In fact, it was not speed that was so important, but rather during the frantic 
time that a deal (such as an initial public offering) was being put together Information 
Technology was being asked to respond quickly. The response had to be quick enough so that 
the broker could earn as much as possible by offering as many services as possible. It was a 
question of wallet share, which in turn is a customer intimate approach. The brokerage wanted 
the customer to maximize its spending with the brokerage so it had to have the longest menu of 
services possible. We settled on a measure of the percentage of the total deal that did not go to 
the brokerage. I/T's job, then was to offer a realistic plan for continual reduction of that 
(missed wallet share) figure. 

2. One computer-oriented defense contractor said it wanted project measures, but when pressed it 
was clear that projects were not managed - and therefore not measured - in the traditional way. 
The government client wanted a provider that would do what it requested, not study the request 
and offer alternatives or push-back. Cost, quality, and duration were not important to the client, 
only that it got what it wanted in reasonable terms. This, too, is a customer-intimate approach, 
one that makes the menu of services just as long as the customer requests are. Naturally, the 
provider has to deliver the systems within a threshold value of cost, quality, and duration, but 
there were already many other providers that performed better in terms of cost, quality, and 
duration, but were rated too low in customer responsiveness to be considered! In fact, the 
client changed its mind often, rendering previous work inapplicable. This would cause rework 
that would traditionally be held against the provider. Traditional project-oriented measurement 
was irrelevant in this setting. We recommended several measures: of the total spent by the cus- 
tomer how much went elsewhere (to be minimized); time spent in adversarial settings (to be 
minimized); time spent with the customer understanding its business (to be maximized); and 
number of people on our staff with credentials like our client's (to be maximized). 

3. A computer services firm had been the prime contractor for a long time for a government 
client. The computer services firm provided all of the computer programming and operations 
for a particular type of payment that the government entity made to deserving applicants. The 
contract was up for renewal and the incumbent wanted to propose a set of measures going for- 
ward that would indicate its operational excellence. The usual suspects were offered in 
discussions with the provider (now bidder), but those measures did not seem to resonate, even 
though they were "reasonable." It turns out that the government organization was feeling 
behind the times in terms of technology and really wanted a new, modem I/T provider, not a 
better, cheaper, faster provider of old technology. In fact, there was no business driver for the 
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desire for more modem technology, only a (vague) belief that such technology would reap 
financial benefits to the government in terms of lower costs and greater flexibility. The 
measures we settled on were: 

• plan vs. actual implementation of a set of new technology introductions, 

• hours spent training the government client on the principles of that new technology, 

• reliability measures directly related to the government organization's business, for 
example, cost of government rework due to provider payment errors, idle government 
worker hours due to system downtime, and government time spent in meetings or on the 
phone with deserving applicants due to provider service failures. 

These measures were instead of other, traditional measures, such as percentage system 
availability (e.g., 99.9% available), data entry error rates (0. 1%), and a threshold number of 
ABENDS per day, none of which related to the government mission or daily reality. 

New model emerging. There is a new model of systems development emerging. It is consistent 
with the lessons of The Discipline of Market Leaders. The place it is first seen is a new breed of 
systems developers: fixed price, fixed duration efforts. Their model is something like: 


Totality of Systems Development 



The totality of the systems development effort is internally divided into two phases: obtaining 
customer requirements and developing a system to meet those requirements. Obtaining 
requirements is an open-ended effort, difficult to estimate, and bid on a time and materials basis. 
Once the requirements are obtained, they are more or less thrown over a wall to a heads-down 
software factory. There the requirements are quickly transformed into an operational system. 

Changes to the requirements are not allowed during the factory period. It isn't that changes are not 
requested, but rather they are queued and made candidates for the next release. A small percentage 
of high-priority changes can be accepted and passed on to the factory, but usually it is a single digit 
percentage, by contract agreement. 

Because the factory can work with its head down, it is fast and good. It has learned how to be, 
perhaps by emulating/applying the best practices promulgated by the NASA-CSC-University of 
Maryland Software Engineering Laboratory, the SEI, Cleanroom, and others. 

If the requirements need to change dramatically before the system is developed, then the whole 
arrangement changes back to what we traditionally have today: gather requirements, try to freeze 
(or at least chill) them, develop a system to meet the requirements, in the midst of that then change 
requirements again and try to absorb the newest changes, etc. 
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The newer’ business model achieves several objectives: 

1 . People attracted to dealing with customers face-to-face do that and only that. 

2. People attracted to dealing with the technical development of systems do that and only that. 

3. People who like to span those boundaries get to do that, too, because they are part of both the 
requirements elicitation and the technical systems development so that, in fact, the 
requirements are not thrown over a wall. 

Technology improvement and change management are different in the two areas. Technical 
development is the stuff we are used to seeing, but the technology of customer intimacy used in 
requirements elicitation is new, both in the technical aspects and in how change is introduced into a 
on-going working relationship between technology solution providers and their clients. 

I see ever more organizations offering this newer model. What would be the downside to the 
client? Anyway, the measurement implication is that a wholly different set of measures would 
apply to the customer intimate activity than to the technical one; the technical one is more or less a 
solved problem. Now we as a profession need to turn to the other two disciplines of market leaders 
and offer them something! 

Acknowledgements. I learned most of this by working with John Title of Computer Sciences 
Corporation. The measurement leader who made me ask myself many of these questions is David 
Card. Many SEI SEPG conference keynote speakers/cheerleaders who claim that those who resist 
have bad character have irritated me into writing this down. Their failure to ask (and answer) 
"Why?" stimulated me. 
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I hesitate to call it "new" because Winston Royce, that doyen of our profession, described it in a keynote address at 
the National Security Industrial Association Seventh Annual National Joint Conference, April 23, 1 99 1 , at Tyson's 
Comer. His talk was entitled, "A completely new software life cycle." 
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Abstract 

Software IV&V, as practiced by the NASA IV&V Facility, is a well- 
defined, proven, systems engineering discipline designed to reduce risk in 
major software systems development. However, we currently have no 
proven methodology for estimatuig resource requirements for IV&V based 
on sound financial criteria. The quantification of a cost stmcture associated 
with IV&V and the resulting benefits are essential to make objective 
decisions conceniing the allocation of resources to IV&V activities. The 
development of ROI metrics for NASA IV&V would provide key 
information to make rational budgetary decisions that impact safety and 
mission critical aspects of all NASA software systems. To measure IV&V 
benefits and costs we must identify relevant measures and provide target 
ranges for those measures that may be used to evaluate whether or not the 
goals are achieved and to what degree. This requires a measurement 
strategy for software IV&V in the NASA context. Tins paper presents the 
NASA IV&V Balanced Scorecard strategic measurement framework and 
discusses its role m providing a mmunal and usable core metrics set. 


1 Introduction 

The Balanced Scorecard, as applied in industry and government, is approached from 
two very disparate viewpoints. Industry is very aware of the importance of financial 
performance measures in managing an organization. Publicly held companies must 
be responsive to market and shareholder demands. Market share, share price, 
dividend growth, and other significant results-oriented financial measures have been 
used historically to evaluate an organization. Government organizations must 
respond to regulatory and legislative acts. One such legislative act is the 
Government Performance and Results Act (GPRA) passed by Congress and signed 
by the President in 1993. This act provides a new tool to improve the efficiency of 
all Federal agencies. 



The goals of GPRA are to: 

■ Improve Federal program management, effectiveness, and public 
accountability 

■ Improve congressional decision making on where to commit the Nation’s 
financial and human resources 

■ Improve citizen confidence in government performance 

A specific difference between government and industry is explicit in the 
government’s focus on cost reduction as compared to industry’s focus on revenue 
generation and profitability. We have customized our BSC to accommodate these 
differences thus providing a framework to evaluate the overall performance of the 
organization through a linked hierarchy of specific performance drivers and 
outcome measures [7], 

1.1 Structure of the Paper 

Section 2 provides an overview of the Balanced Scorecard and motivations for its 
use. We then excerpt portions of our scorecard to exemplify our measurement 
framework, the application of cause effect graphing and the setting of strategic 
measurement targets in Section 3. Section 4 discusses specific BSC measurement 
issues and lesson learned. Section 5 concludes our paper and discusses current 
directions of our work. 

2 Balanced Scorecard 

The Balanced Scorecard (BSC) Framework provides the necessary structure to 
evaluate quantitative and qualitative information with respect to the organization’s 
strategic vision and goals. There are two categories of measures used in the BSC the 
leading indicators or performance drivers and the lagging indicators or outcome 
measures. The performance drivers or leading indicators enable the organization to 
quantitatively track whether or not the organization is achieving short-term 
operational improvements. The outcome measures or lagging indicators provide 
objective evidence of whether strategic objectives are achieved and to what degree. 
The two measures must be used in conjunction with one another to link 
measurement throughout the organization thus giving visibility into the 
organizations progress in achieving strategic goals through process improvement 
[14]. 

The development of a core set of metrics for implementing the Balanced Scorecard 
is the most difficult aspect of the approach. Developing metrics that create the 
necessary linkages of the operational directives with the strategic mission prove to 
be fundamentally difficult as it is typical to view organizational performance in 
terms of outcomes or results rather than focus on metrics that address performance 
drivers that provide feedback concerning day-to-day organizational progress. 

The BSC is not the organizational strategy but rather a measurement paradigm to 
provide operational and tactical feedback. The organizational strategic vision and 
goals are the foundation upon which the framework is constructed and are taken 
from public domain documents. The strategic plan contains the vision, goals, 
mission and values for the organization. The Government Performance and Results 



Act, GPRA requires all federal agencies to establish strategic plans and measure 
their performance in achieving their missions. The vision and goals are stated below. 

Vision: To be world-class creators and facilitators of innovative, intelligent, 
high performance, reliable informational technologies that enable NASA 
missions. 

Goals: To become an international leading force in the field of software 
engineering for improving safety, reliability, quality, cost and performance 
of software systems; and to become a national Center of Excellence (COE) 
in systems and software independent verification and validation. 

3 BSC Architecture 

The BSC architecture was intended to provide a framework for industry and for- 
profit organizations. The framework facilitates translating the strategic plan into 
concrete operational terms that can be conmnmicated throughout the organization 
and measured to evaluate its day-to-day viability. The three principles of building a 
balanced scorecard that is linked through a measurement framework to the 
organizational strategy include; 

(1) defining the cause and effect relationships, 

(2) defining the outcome measures and performance drivers, 

(3) linking the scorecard to the financial outcome measures [5]. 

The initial steps of BSC engage in the construction of a set of hypotheses 
concerning cause and effect relationships among objectives for all four perspectives 
of the balanced scorecard. The measurement system makes these relationships 
explicit. Therefore, they can be used to assess and evaluate the validity of the BSC 
hypotheses. The questions asked in each category of the four perspectives provide a 
segue into the cause effect diagramming activity. It is this activity that exposes the 
value chain associated with specific IV&V activities. 

3.1 Defining the Cause-Effect Relationships 

IV&V is conducted using different approaches and methods depending the goals of 
the IV&V team. To define causal relationships we must evaluate the measurement 
based on a context sensitive method: 

1) Identify the underlying IV&V process relative to the development process. 

2) Identify the activities (methods, models and tools) by inputs and outputs and 
entry and exit criteria. 

3) For activities categorized as information management IT, measure the value of 
information to decrease uncertainty, mitigate risk, improve quality... 

4) For analysis activities we define the value for the outputs such as problem reports 
at a given time in the lifecycle and by criticality. 

We begin by formulating hypotheses concerning the value of IV&V in a given 
context of the Space Shuttle IV&V activities. The hypotheses are based on inferred 
or known relationships documented in prior studies reviewed under the first phase of 
our ROI project. We state the initial hypotheses as constructed, however their 
review and evaluation are an ongoing activity. 



The hypotheses developed are based on several assumptions that are based on 
current understanding of the interaction of the IV&V process and shuttle 
development process. The Space Shuttle is considered a product-line as defined by 
the SEI as well as the general research community. The characteristics that make the 
shuttle a product line process include the systematic reuse of a set of core 
architectural and component based assets that are reused in each incremental release. 
This core commonality is extended to support each operational increment (01) and 
represents a negotiated and limited degree of domain variability. 

Hypothesis 1: The benefits of IV&V contributions are realized as domain 
engineering and applications engineering benefits. This means some 
benefits should accrue to the core structure of shuttle software and be an 
ongoing contribution in its maintenance and extensibility. 

Hypothesis 2: The benefits of the application engineering accrue almost 
entirely to the developer. That is the defect reduction that occurs in 
development is enabled in part by IV&V contributions to domain 
engineering. 

Hypothesis 3: The benefits of product-line engineering in the shuttle are 
significant in reducing testing costs while maintaining high levels of testing 
quality. The degree of test suite and test environment reuse is exceptionally 
high and results in a significant cost savings. 

Hypothesis 4: This is fundamentally a unique system that is developed using 
sophisticated reuse. This requires us to view the system as generating shuttle 
“builds” from an investment of core assets. The benefits are primarily 
derived in the reusability and rapid extensibility of the shuttle code. 

Hypothesis 5: Adherence to an architecture enables system safety, reliability 
and quality standards to be imposed and verified for the core assets of the 
shuttle. Acceptable degrees of variability to extend functionality are 
approved by a team of architects and systems engineers that includes the 
IV&V team. 

We map our hypothesis to a set of objectives concerning the value of IV&V and the 
necessary and sufficient factors to creating value for the organization in terms of the 
strategic vision and goals. The BSC is segmented into four categories of objectives 
customer, financial, internal business processes and learning and growth segments. 
The objectives for the four segments are the following: 

■ customer segment objectives correspond with the high level goals of mission 
success through high quality, reliability and safety. 

■ financial segment objectives focus on cost reduction, efficient asset utilization 
and high ROI values of IT investments. 

■ internal process objectives relate to specific software and systems engineering 
approaches such as product-line development paradigms, CPI and QIP efforts, 
and test technologies and best practices as defined for IV&V. 

■ learning and growth objectives include technological infrastructure for 
distributed development, workforce training programs, skills assessment 
program, and ISO-9000 process structure. 



Mission 



Functional Requirements 
Quality Objective 
Reliability Objective 


Safety 



Figure 1.1 Influence diagram of IV&V BSC objectives. 


The objectives are used in the selection of a minimum set of required metrics to 
measure day-to-day performance as well as longer term outcome or results metrics. 
This aspect of the framework focuses on development of leading and lagging 
indicators. An example customer focused objective would be the improvement in 
overall safety due to IV&V activities. A leading indicator for this objective could be 
the number of identified potential hazardous states resulting from a safety impact 
analysis or a tracking of the hazard rate during development. A result measure or 
lagging indicator could be the number of in flight anomalies (IF A) that are 
documented. The leading and lagging indicators must be assigned desired or 
normative values. These values become targets or target ranges for the metrics 
collected. Finally, the initiatives that have been sponsored to achieve the objective is 
identified and reevaluated with respect to the quantitative and qualitative evidence 
of success relative to the target values (see table 1.1.) 


Customers 

(Internal 

External) 

Objectives 

Measures 

Targets 

Initiatives 

No Losses 

# Severity 1 &2 

Remove < FRR 

Formal Methods 

Reduce Risk 

# IFA's 

No Severity 1 

Risk 

Management 

Manage Risk 

Fault tolerance 

Performance 

Risk Mitigation 


Table 1.1 Customer focus metrics definition. 




The relationships among the customer objectives of interest are significant as they 
are not independent of one another and therefore must be analyzed based on their 
degree of covariance and interaction. The relationships are diagrammed Fig. 1.3 and 
depict the current accepted understanding. Safety requires that unsafe states cannot 
be entered from any point of function of the system. It is possible for the systems to 
function reliably that is without failure and still enter unsafe states of operation. A 
system can be completely correct and defect free and still enter unsafe states. There 
are many documented examples of these properties in the literature and many 
devoted specifically to documenting the complexity of software safety issues. The 
safety of a system is a result of its safe operation in a specific context or 
environment. We provide definitions of safety, reliability, quality and cost as 
defined for the customer objectives of the BSC. 

■ Safety is defined as freedom from accidents or losses. This is an absolute 
statement, safety is more practically viewed as a continuum from no accidents 
or losses to acceptable levels of risk of loss. 

■ Reliability is defined in terms of the probabilistic or statistical behavior, that is 
the probability that the software will operate as expected over a specified period 
of time. 

■ Quality is defined in terms of correcmess and number of defects. Correctness is 
an absolute quality, it is also a mathematical property that establishes the 
equivalence between the software and its specification. 

■ Cost is more complex than it appears, direct or absorption costing may be 
applied and alters what costs are included and therefore what costs may be 
reduced. The focus of the paper does not rely on the differences inherent to 
these two approaches and therefore defers discussion of this topic. 

The NASA IV&V facility must document the increase in software and systems 
safety, reliability and quality that are attributable to IV&V technologies. This 
requires that the contribution that is made towards meeting required targets through 
the application of IV&V activities must be quantified. This requires that each aspect 
be evaluated relative to some objective target. The value add of IV&V is measured 
as the sum of overall reduction of distance from the target. This provides a measure 
of overall impact to mission success. The relative reduction of “Euclidean Distance” 
from the safety target of no losses attributable to IV&V specifically is documented 
and integrated into the overall model that sums the total reduction of distance from 
the three targets of safety, reliability and quality. There are many measures that can 
be collected to evaluate the value added of IV&V for software and system safety; 
this is only one approach. The measurement of the contribution of IV&V in 
improving safety, reliability and quality while reducing cost is discussed in the 
following sections. 



Fig. 1.3 Relationships among customer themes of mission success through safety, reliability, and 
quality at reduced costs. 



4 BSC Issues and Lesson Learned 


The four strategic mission goals of importance to our customers are safety, 
reliability, quality and cost. This section discusses those aspects in terms of 
measurement as is defined in the balanced scorecard. 

SAFETY The contribution of IV&V to shuttle safety is difficult to measure directly. 
It is therefore necessary to make assumptions concerning those factors that would 
impact safety and to what degree. It is assumed that a reduction in the probability of 
failure is a contribution to increased safety. A reduction of the number of In Flight 
Anomalies IF As of a severe nature due to IV&V identification and removal is a 
contribution. An independent evaluation of potential failure modes that results in 
identifying previously unidentified hazards is a contribution. 

RELIABILITY The contribution of IV&V to shuttle reliability is more directly 
attributable to the specific verification activities that are applied during the Shuttle 
software development process towards defect management. Research investigating 
the ramifications of testing strategies for reliability provides quantification of 
benefits relative to specific IV&V activities. A minimization of estimated residual 
faults is provided according to the sequence of testing strategies and the duration of 
those test executions. For example the number of defects detected by applying 
functional, decision, data flow and mutation test methods in sequence. The CPU 
execution time or the number of test cases can measure test effort. As the test effort 
increases defects detected can be optimized through applying more optimistic or 
pessimistic test strategies. The resulting increase in reliability is measured by 
increased MTTF or improved failure intensity profiles and is quantified as a 
reduction in the distance from the reliability targets of subsystems undergoing 
IV&V. 

QUALITY The contribution of IV&V to shuttle quality is measured as a reduction 
of defect density trends through process improvement paradigms such as traversing 
the CMM stages from levels 2,3,4 to level 5. The intuition behind this model is that 
the measurable impact of process improvement is in the reduction of the cost of 
rework Specific examples of applying this concept are documented in the literature 
and state substantial savings associated with rework avoidance. Raytheon Systems 
Corporation reported cost savings of $15.8 million for 15 projects over a four-year 
period. Raytheon documents an ROI of 7:1 based on $4.48 million return for 
$580,000 invested. Hughes Aircraft reported cost savings of $9.2 million over a 
three-year period. Hughes documents an ROI of 4.5: 1 based on $2 million return on 
$400,000 invested. The Aircraft Software Division at Tinker Air Force Base 
reported an ROI of 6.35: 1 based on a return of $2.9 million for $462,100 invested. 
In addition, the rework cost avoidance of detecting defects of severity I; severity 2 
and severity 3 can be quantified relative to phase of detection and level of severity. 
The reduction of defect density is measured as a reduction of distance from the 
overall quality objective measured in defect density according to severity. 

COST In the early 1990’s the software engineering community adapted ROI to 
measure the costs and benefits of SEI/CMM process improvement efforts. Published 
examples of how ROI for CMM based process improvements are measured and 
interpreted provide guidelines for the basic proposed ROI model [7,13]. The process 



community quantified process and product improvement using the following four 
major development-cost structures drawn from Crosby’s work as published in 
“Quality is Free” and “Quality Without Tears” [3,4], Crosby’s work is referenced by 
Capers Jones as the seminal work in this area and has been used as the basis for cost 
structuring by DoD contractors such as Raytheon Systems [17], The cost categories 
include: 

1. nonconformance rework costs (such as fixing code defects or design 
documentation), 

2. performance costs associated with doing it right the first time (such as 
developing the design or generating the code), 

3. appraisal costs associated with testing the product to determine if its faulty, and 

4. prevention costs incurred trying to prevent faults from degrading the product. 

Industry has applied these four cost categories to measuring ROI for software 
process improvement by using rework costs avoided (nonconformance costs 
avoided) as the numerator and appraisal and prevention costs directly related to 
process improvement efforts for the denominator [7,18], The intuition behind this 
model is that the measurable impact of process improvement is in the reduction of 
the cost of rework [3,4,10,1 1], 

A measurement framework is necessary to bridge the gap between strategic 
measures of improved reliability, safety, and quality at reduced cost and operational 
measures of optimization of resource allocations applicable to daily activities to 
achieve these goals. The BSC provides a means of measuring the efficiency of 
resource allocations for the operational processes of software and systems 
verification and validation activities that must then be linked to the high level goals 
of mission success at reduced cost. In applying the BSC we have learned many 
lessons of value concerning our strategic planning as it relates to the activities 
conducted to accomplish daily operational goals. First, we have found that a 
customer focus of the strategic themes provides the necessary linkages in the BSC to 
measure our leading and lagging indicators successfully. We have also learned that 
the CMM and ISO-9000 initiatives are split across the core process tier and the 
infrastructure tier of the BSC hierarchy. These two findings are essential in applying 
the BSC to a government or not-for-profit organization such as the NASA IV&V 
Facility. 

5 Future Directions 

The primary focus of learning and growth measures for IV&V specifically is the 
information technologies (IT) used to obtain, retrieve, disseminate and store key 
information products [6]. The IV&V Facility is located in West Virginia and yet 
services all the NASA Centers from the Pacific to Atlantic coasts. To support this 
distributed context. Communications technologies such as VITS, VOTS and internet 
tools such as web-based data collection repositories are required. Specific measures 
to quantify performance, cost, and quality for IT infrastructure to support IV&V 
technologies must be further evaluated to provide meaningful target ranges for IT 
performance metrics. 



In addition, further investigation into the measurement of core processes as defined 
under ISO is required. The ISO-9126 Standard, documents 6 high-level software 
qualities including functionality, reliability, usability, efficiency, maintainability and 
portability. These high-level qualities are mapped to 24 sub-characteristics. Metrics 
are proposed to measure the high-level software qualities relative to the sub- 
characteristics. This ISO standard could provide the necessary metrics to measure 
operational processes under the process aspect of the BSC, relative to the 
application of product line reuse, and map them to the high-level goals. Of particular 
interest in this standard is the definition of reusability as the combination of 
maintainability and portability. It will be of interest to analyze the appropriateness 
of the standard in measuring reuse for the shuttle [9]. Specifically, reuse across a 
vertical product line that incorporates domain engineering, architecture-based reuse, 
and reusable test technologies. 
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Software Requirements Engineering 

- Abductive reasoning 

- Model based reasoning 
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Abstract 

In November 1998 the CSC SEAS Center achieved the rating of CMM Level 5 and 
became the sixth organization in the world to have ever attained that goal. The 
Capability Maturity Model (CMM) (Reference 1) is a worldwide recognized benchmark 
of process maturity for software organizations and is used to assess the quality of an 
organization’s software process. During the period covered by this study, the SEAS 
Center comprised approximately 850 personnel supporting systems engineering, software 
development, and analysis for NASA/GSEC. During the years of continually improving 
the processes toward the goal of attaining the level 5 rating, detailed information was 
recorded, tracked and analyzed so that subsequent efforts by other CSC organizations 
could benefit from the experiences of SEAS. This paper is a direct result of the 
collection and analysis of that process experience data. 

This paper begins with a brief overview of the SEAS organization that emphasizes the 
aggressive process improvement approach that has been in place since 1994. The paper 
will discuss the coordination of improvement initiatives, the role of goals and industry 
benchmarks, the organizational strategy and the use of key documents in measuring 
improvements. Additionally, the investment and benefits of an improvement program are 
discussed. Finally, based on the SEAS experience, the paper presents seven key factors 
that are the recommendations for any software organization undertaking an aggressive 
process improvement program. 
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Section 1 Background 

CSC is a major software integration and services provider with over 50,000 employees in 
offices worldwide. The Systems, Engineering, and Analysis Support (SEAS) Center is 
part of the Federal Sector and comprises approximately 850 persons supporting the 
National Aeronautics and Space Administration (NASA) at the Goddard Space Flight 
Center (GSFC) in the disciplines of systems engineering, development, maintenance, and 
analysis (Figure 1-1). 

CSC has supported NASA in the GSFC environment since the 1970’s. Staffing at the 
Center has varied from 700 to 1700 over the last 10 years. The SEAS Center is organized 
as a program with central offices supporting program management (PMO), process 
engineering (PEG), quality assurance (QAO), and program control (PCO). Software 
configuration management is typically a project responsibility and subcontracting for 
product development is very rare. The number of projects within the program varies but 
is typically about 20. Approximately 50% of the organization is directly involved in the 
software development or maintenance activity. 



Figure 1-1 SEAS Center Within CSC 

Because of the growing importance of establishing process maturity within software 
intensive organizations, the SEAS Center initiated an aggressive process improvement 
program in 1995. A process improvement plan with specific goals was written to guide 
the initiative. Of the goals, four were product goals with objective measures 
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(productivity, quality, predictability, cycle time), and another goal specified compliance 
with standard industry benchmarks. 

The processes used to support the work on SEAS have always been regarded (by the CSC 
staff) as being good processes although an early external evaluation of the processes 
produced a Level 1 CMM rating in 1991. Despite this early discouraging result, the 
Center continues to view benchmark evaluations as an important activity supporting 
process improvement efforts (Figure 1-2). 

After some success with internal process audits and CMM self-assessments, SEAS 
Center adopted the use of evaluations against industry benchmarks conducted by 
independent consultants. The 1995 process improvement plan included goals for both 
CMM and ISO 9001 (hereafter referred to as ISO) evaluations. 

The results of benchmarking activities are summarized in Table 1-1. In 1998, the SEAS 
Center became the sixth organization in the world to be rated at CMM Level 5 and the 
first organization to be both CMM Level 5 and ISO registered. 


1994 1995 1996 I 

1997 

1998 1999 2000 

1 ♦ ♦ 

1 ^ ^ 

♦ "" 


+ + + 
s s s s 


4 - SCE 

^ ISO 9001 registration audit (R), surveillance audits (S) 

♦ Software process self assessments (SPA) and software 
process audits 


Figure 1-2 SEAS Center Benchmarking History 
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Table 1-1 Summary of Benchmarking Activities 
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Section 2 Approach 

As discussed in Section 1, SEAS had an extensive legacy of process development and 
improvement at the time that it achieved CMM Level 5 in 1998. SEAS process 
development work during the late 1980s and early 1990s consisted primarily of 
refinements of the SEAS System Development Methodology (SSDM) and its supporting 
standards and procedures (S&Ps). Such refinements were recommended by process users 
and approved by senior management. This bottom-up approach worked reasonably well 
and resulted in the establishment, deployment and use of SSDM and approximately 100 
S&Ps. 

Between 1989 and 1994, 508 proposed changes to SSDM were submitted by process 
users; of these, 379 were implemented in whole or in part. Unfortunately, most concerned 
relatively minor adjustments to existing processes. SEAS management noted three major 
flaws in this process improvement strategy: (1) a formal “learning through 
experimentation” process was not being used, (2) establishment and measurement of goal 
achievement was weak, and (3) SSDM and its associated S&Ps were becoming obsolete 
since new approaches and methods were not being adequately integrated. A new 
approach was needed. 

During the early 1990’s the Quality Improvement Paradigm (QIP) (Reference 2) was 
being used in the SEAS Software Engineering Laboratory (SEE). (The SEE, Reference 3, 
is a joint venture involving CSC, NASA and the University of Maryland.) The QIP, 
shown in Figure 2-1, established a framework for improving SEAS by treating projects as 
experiments, packaging results, and making such results available to all SEAS projects. 
The QIP eliminated the process improvement flaws noted above and was accepted by 
SEAS management as a solid foundation upon which to build the SEAS improvement 
program. Since 1994 the QIP has served as the model for process improvement for the 
SEAS Center. 

The SEAS adoption of the QIP as its improvement model focused attention on improving 
key activities such as communication, coordination, establishment of goals, measurement 
of change, and experience sharing. Thus, attention was redirected from refinement of 
existing processes to making SEAS a learning organization based on the experiences of 
its projects. Adoption of the QIP radically changed how improvement was addressed by 
the organization. Some of these changes are briefly discussed below. 
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1. SEAS-Level Coordination of Projects’ Process Improvement Initiatives. 

The QIP is based upon the assumption that a Program-level group is aware of project- 
level experiments, provides guidance to projects, and makes successes and failures 
known to other projects within the Program. For SEAS, responsibility for this type of 
coordination was assigned to the PEO. Use of “shepherds” and weekly ‘Process 
Deployment Team Meetings’ as described below directly resulted from adoption of the 
QIP. 

• Shepherds are typically Process Engineers or Quality Assurance personnel who are 
aware of activities and project experiments throughout the organization. The 
shepherds are assigned to work directly with a project to guide process 
implementation and avoidance of problems experienced by prior projects. The 
shepherds perform as project support personnel in responding to needs of the projects 
in tailoring, understanding, and implementing processes appropriate for the project. 

• Process Deployment Team Meetings are weekly 1-hour meetings held to discuss 
some aspect of the SEAS processes. The meetings are facilitated by a process 
engineer and attended by all levels of management and some personnel from the 
projects. Typically, these meetings take the form of a briefing followed by questions, 
answers and comments regarding the given topic. Topics have included; top 10 steps 
in adopting mature processes, effective use of measurement, how ISO and CMM are 
related, impacts of inspection techniques for software, how to set goals in project 
planning, effective risk management, how our processes conform to Level 5, and 
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results of recent project experiments presented by project personnel. The meetings are 
interactive, with all participants joining in the discussion. 

2. Establishment of Product-related Goals Rather than a Goal of 
Compliance with Industry Benchmarks 

The QIP requires establishment of goals. For organizations such as SEAS, compliance 
with industry benchmarks is an important business goal. Much of SEAS process-related 
work in the early 1990s was directed to the goal of demonstrating compliance with the 
CMM. However, once the QIP had been adopted as the improvement model, SEAS goals 
evolved from a focus on complying with industry benchmarks to a focus on improving 
products and achieving customer satisfaction. Project buy-in to use of the QIP was easily 
achieved once projects appreciated the value of learning to improve their products based 
on the experiences of prior projects. 

3. Use of Industry Benchmarks as Tools to Achieve Product-Related Goals 

SEAS established ISO-9001 as its primary tool for guiding and measuring improvement. 
Similarly, the SEI CMM served as a tool for measuring progress in improving the SEAS 
software development processes. ISO requires participation by all elements of the 
organization, in contrast to the software development focus of the CMM. However, ISO- 
9001 and the CMM are complementary and support the product improvement strategy as 
embodied in the QIP. (As a byproduct, use of ISO and CMM support senior 
management’s business goal of compliance with key industry benchmarks.) Industry 
benchmarks such as ISO and the CMM served as gates for verifying process maturity and 
use. Use of external assessors ensured objectivity in measuring progress toward 
achievement of goals related to compliance with industry benchmarks. 

4. Use of ‘Separation of Concerns’ Strategy 

Project personnel were not required to become familiar with the details of the QIP or 
industry benchmarks; deployment of the QIP, ISO, CMM and other strategies was 
assigned to the process engineers. This left projects free to focus their limited resources 
on improving their products and services rather than on complying with industry 
benchmarks. As discussed above, the shepherds provided guidance to projects in 
applying the QIP and complying with the industry benchmarks. 

5. Document Organization Profile and Improvement Goals 

Application of the QIP requires an understanding of current product characteristics 
(defect rates, cycle time, accuracy of estimates, etc.) and improvement goals. Therefore, 
consistent with the QIP, SEAS documented its organizational and product characteristics 
in a profile document (Reference 4) and established SEAS-level improvement goals in a 
process improvement plan (Reference 5). These documented “where we are” and “where 
we want to go”, and served as the roadmap for measurable process improvement. The 
SEAS Quality Management System Manual (Reference 6) documented the roles and 
responsibilities of each SEAS group in achieving improvement. 
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The reader should be cautioned that the QIP worked well for SEAS and would likely 
work well for other organizations. However, for maximum effectiveness, it should be 
applied with consideration given to the culture and maturity of the organization. For 
SEAS the approach was to focus on identification and deployment of a formal model 
since basic processes were already in place. Recommendations for other organizations are 
provided in Section 4. 
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Section 3 Return on Investment 

In order to determine the value of investment made toward process improvement, the 
SEAS Center measured impacts of improvements in three areas: 1) impacts to the 
performance of the organization 2) impacts to business opportunities and 3) impacts to 
the products generated. This set of measures of ‘return on investment’ was used to 
continually mold the program of process improvement and to help determine which areas 
of improvement should be the focus for continued efforts. They were also used to make a 
determination as to whether or not the process improvement program was worth the 
investment of time and resources and whether or not the program should be continued or 
modified. The value of the process program was measured against the cost of the overall 
program. This value of the program compared to the investment cost is what we term 
‘return on investment’. 

3. 1 Cost of Process Improvement 

The cost of the process program was tracked by maintaining detailed records of the effort 
expended by staff carrying out activities directly on the program (Process Engineering 
staff as well as Quality Assurance staff) and also including indirect effort required by the 
project organization in attending special training sessions or attending special audit 
activities. The tracked costs include developing processes, deploying, measuring, 
training, maintaining (packaging), developing infrastructure, and process improvement. 
The costs do not include project operations performing CM, QA, planning, etc., but do 
include their cost of participating in studies, training, audit participation. 

For the period July 1994 through November 1998 (the date when the Level 5 was 
attained) the cost of the process improvement program was approximately 30 staff years 
of effort. This cost was primarily the cost of the organization’s process engineers 
responsible for defining and carrying out the improvement program. Fairly detailed 
records were kept in order to track this expenditure. Records of costs permitted the 
analysis of the distribution of effort across different functions and the shift of allocation 
from early months of the program to later months of the program. 

The records of costs categorized the effort by 5 main areas of activity: (1) writing and 
maintaining written processes, (2) deployment of processes ( working with projects via 
training and direct help in using processes), (3) creating and maintaining the 
infrastructure of processes (data bases, libraries, etc.), (4) planning improvement 
including the writing of plans, carrying out studies and analyzing measurement, and (5) 
reporting and participating in reviews of the process program. 

Table 3-1 shows the distribution of the effort for these 5 major activities. Overall, the 
highest percentage of effort was allocated to the deployment activities. Process engineers 
focused on getting the defined processes into practice (shepherding) as opposed to only 
focusing on generating and maintaining the written standards, processes, methodology, 
etc. 
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Table 3.1 Cost Distribution for Process (For Organization of 800 Persons Over 4 Years) 


Table 3-1 also indicates a shift in emphasis from the writing and refining written 
processes to the emphasis on deployment of process. The shift reflects that over time the 
process engineers realized that the largest value of the program was in interacting directly 
with the projects and not in merely producing and enhancing written processes. 

3.2 Value of the Process Improvement 

As mentioned in the introduction, the impact of the process program was measured in 
three areas: value to the organization, value to business opportunities, value to the 
products generated. 

3.2.1 Impact to the performance of the organization 

The first measure of the impacts of the improvement program was a determination of 
perceptions, general performance and structure of the organization as a whole. In 
general, it is a determination as to whether or not the personnel viewed the program and 
the changes as a value to their own projects performance. This was determined by taking 
surveys, interviewing project personnel and managers and by soliciting feedback from 
customers. 

There were significant favorable impacts to the overall enterprise characteristics of the 
SEAS organization. These changes included both technology enhancements as well as 
operational impacts that supported a more efficient and effective stracture. Specific 
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impacts that were identified by both project personnel as well as managers across the 
organization included: 


1 . The process improvement program resulted in a focus of achieving common goals for 
SEAS. With the formal improvement plan generated and with specific goals 
identified as part of the plan, there was a foundation established for all SEAS 
personnel to contribute to improvement. The improvement goals and overall program 
prompted project persoimel to contribute to the overall SEAS improvement program 
as opposed to only their own project program. This was supported through the 
management reviews, process meetings, progress reporting and assessments (both 
internal and external) that were included as part of the program. The improvement 
program promoted the concept of SEAS operating as a well disciplined enterprise 
rather than a set of individual projects with local goals and challenges only. 

This fact of operating as an integrated organization also improved the communication 
between projects (sharing lessons, improvement ideas, measurement approaches, and 
tailoring approaches for SEAS processes). 

2. The improvement program added a strong discipline for all projects to adopt and 
adhere to SEAS processes. The improvement program included the use of formal 
assessments such as ISO audits, SCEs, and internal audits. With the use of regular 
formal assessments and with the strong senior management support of the 
improvement program, all projects within SEAS had strong incentives to adhere to 
the processes and disciplines defined by SEAS. 

3. The improvement program resulted in a significant upgrade and improvement to the 
set of SEAS standards, policies, and processes. Since the program adhered to the 
concept that changed processes should be driven by needs and experiences of projects 
(as opposed to being changed to meet an external benchmark) and since ISO stressed 
the value of producing processes that were short, crisp and directed to the actual 
needs of the projects, the set of SEAS standards and processes were revised with a 
focus on project need and SEAS lessons learned. This resulted in a set of processes 
that the projects felt were much more in keeping with their specific needs. 

4. The improvement program promoted an accelerated adoption of needed technology 
change. The activities of the improvement program included the continual search and 
incorporation of enhancements that would lead to more efficient development and 
operations. There were several technology changes that were driven by this approach 
to sustain change. Such enhancements as the universal adoption of on-line, electronic 
documentation and the adoption of common CM tools were prompted by the 
improvement program. The goal of attaining full ISO registration was more easily 
addressed by producing complete on-line, electronic documentation. 

5. The accomplishments resulting from the improvement program produced a sense of 
pride and accomplishment for the entire SEAS organization. The recognition that 
SEAS received by achieving ISO registration and by attaining high maturity ratings 
with CMM was shared by all SEAS personnel. Since all projects and personnel 
participated at some level, the entire organization felt the recognition received was 
something that each of the individuals could be proud. 
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3.2.2 Impact to business opportunities 

The second measure of value of the process improvement program is the impacts it had 
on business opportunities. The improvements demonstrated by the SEAS program 
played a major role in winning new business for CSC. The improvement program in 
general demonstrated to potential clients that CSC was very serious and committed to 
process improvement. This fact alone can be a discriminator in selecting a support 
contractor. It is important that clients see a demonstrated program of sustained 
improvement. 

In addition to demonstrating an aggressive improvement program, CSC could point to the 
levels of achievement recognized by CMM and by ISO. These achievements are 
frequently used by potential clients in scoring capabilities of contractors. In the case of 
the SEAS achievements, at least 3 programs used the independent ratings (ISO and 
CMM) and the established processes as consideration in selecting CSC for additional 
work. The additional work in 1999-2000 amounted to over $500M in contract value. 

The established SEAS processes were identified as key elements of the new work. 

3.2.3 Impact to the software products 

Probably the most important measure of success of any improvement program is the 
measure of product improvement. Have the products and services been favorably 
impacted by the changes made to process? 

The SEAS improvement plan identified 4 product measures that were part of the goals of 
improvement. The product measures included productivity, defect rates, cycle time, and 
estimation accuracy. From the start of the program in the Summer of 1994, detailed 
measures, records, and general information were recorded for the purpose of guiding the 
change and for tracking impact of any changes that were made. Details of the measures 
that were tracked and the results of analyzing the changes to the product measures were 
reported in 1998 at the time of the Level 5 rating. Details of these results can be found in 
Reference 7. 

By reviewing the detailed process ratings over many years along with the detailed 
product data (productivity, defect rates, etc.) an attempt was made to statistically 
determine the correlation between the process changes (increasing maturity level) and the 
product changes. The analysis showed a constant 6 percent/year improvement for both 
productivity and quality from the start of the program through the end of 1998. (See 
Figure 3-1). Further analysis showed that there was also a 6 percent/year improvement 
from 1987 through 1994. Attributing sustained product improvement to process change 
from this evidence is not conclusive. Improvements in technology, personnel, 
environments, as well as process change, remain as possible sources of the observed 
product improvements. 
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Average Productivity for all Projects Active in Year 

Consistent/constant 6% per 
year productivity 
improvement - even prior to 
CMM Level 1 rating 
Also 5% per year quality 
improvement - even prior to 
CMM Level 1 rating 
No change in improvement 
rates after aggressive 
improvement programs 
started 


Figure 3-1 Productivity and Quality Trends Over Time 

When an attempt was made to correlate process maturity of projects with the product 
measures (Figure 3-2) there was no statistically significant result. The correlation was 
computed from data extracted from SCE reports generated for each project. Each project 
was reported compliant, partially compliant or not compliant with each Level 2 and 3 
Key Process Area (KPA). From this data, each project was assigned a maturity ‘score’ 
on a scale from 1 to 3. Product measures and the maturity ‘score’ were analyzed for a 
correlation between high maturity ratings and the high performance of each of the 
product measures (quality, productivity, cycle time and predictability). Correlalations 
were all of low significance; the R values ranged from a low of 0.15 to a high of 0.49. 
There is not a clear explanation for this, but the authors surmise that the strongest 
explanation is that process is simply a very difficult parameter to measure in isolation. 
Using a project’s maturity rating as the only measure of process may be too simplistic. 
Details of this process are explained in much more detail in Reference 7. Work on this 
analysis is continuing. 



Average Defect Rate for all Projects Active in Year 



Y«ar Projact Aetiva 
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Productivity 



CMM"Scoib" 


□efect Rate 



■ Detailed measurennent data on 
90 projects was accumulated 
(over 9 years). 

■ Data included accurate product 
data (cost, defects, size, etc.) 
and process data (based on 
assessments). 

■ Analysis to determine 
correlation between process 
maturity and product data 
showed minimal correlation. 


Figure 3.2 Impact of CMM Maturity on Cost, Quality, Manageability 


3.3 Relative Impact of Improvement Activities 

There were many activities undertaken and many avenues pursued with the goal of 
attaining the high maturity ratings and demonstrating improvements to the SEAS 
organization. Shortly after the Level 5 rating was achieved, a review of the lessons, 
activities and steps was held in an attempt to determine which of the steps seemed to be 
of most significant value (and which seemed to be of minimal value). 

Sources of information included surveys collected from project developers and managers, 
lessons learned reports generated periodically during the 4-year initiative, interactive 
workshops held (as part of the regular ‘Process Deployment Team Meetings’), and 
interactive discussions held with the process and quality assurance personnel. Personnel 
were asked to identify which activities had the most favorable impact on improving 
processes within SEAS as a whole and on projects specifically. Four activities 
consistently were rated as being the most effective in leading to the success of the process 
improvement: 

• Shepherding 

• Process deployment team meetings 

• Library building with process evidence 

• ISO 

The first two activities were discussed in detail in Section 2. 
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The evidence gathering/library building was an exercise requiring projects to produce 
specific evidence for key aspects of project processes. There were several benefits to this 
exercise: 

• It allowed the process engineers to review evidence and point out potential 
deficiencies (so projects could make adjustments) 

• It disciplined the projects into reviewing just how processes were being implemented. 

• It enabled the sharing of concepts across projects through the sharing of artifacts and 
the discussion of approaches at process deployment team meetings. 

• It helped to identify processes that may be misused or ineffective. 

ISO was almost universally identified as one of the most beneficial tools adopted in 
pursuing excellence in process within SEAS. Although CMM had been part of the 
culture within the organization for over 7 years, the use of ISO was identified as one of 
the top activities in attaining excellence. Several reasons were given for this: 

1 . ISO addressed the entire SEAS organization as opposed to software projects and 
personnel only. This required that all personnel be involved in the concept of process 
which resulted in SEAS becoming a fully integrated enterprise with process as a 
major theme. 

2. ISO was much easier to understand and to adopt than the full suite of CMM KPAs. It 
de-emphasizes process detail and focuses on understanding and applying the basics. 

3. ISO successes gave the organization a ‘can-do’ attitude which was reflected in a 
much higher level of confidence when more detailed reviews of CMM were 
addressed. 


15 



Section 4 Lessons Learned 

As was noted previously, detailed records of the experiences, costs, impacts and general 
impressions of the overall activities were archived by the process improvement team. In 
reviewing this information and by carrying out extensive interviews with project 
personnel and managers, the successes and shortcomings were analyzed in an attempt to 
identify the most effective activities and approaches that led to the high maturity level of 
the SEAS Center. There are 7 points that were gleaned from the experiences as reflecting 
the most important activities that an organization should adopt as part of their 
improvement program. 

Recommendation 1: Operate as a Level 5 Organization 

This recommendation suggests that an organization should not focus on sequentially 
addressing the CMM Levels from 2 through 5 nor should they focus on sequentially 
addressing individual KPAs. Instead, the most important element of the improvement 
program is to establish a culture of continuous improvement based on the goals and needs 
of that organization. The concept of ‘continuous improvement’ can be termed an 
‘optimizing’ organization (Level 5) and has several key elements that should be 
established from the start: 

• Focus on improvement of the product (as opposed to merely improving process). 

Such goals as cutting defects or improving productivity or decreasing cycle time 
should be the measure of change; not the number of processes that are established. 

• Step 1 is to define the baseline of the products and process. That implies that the 
current product characteristics (cost, time, defect rates, effort distribution, etc.) must 
be captured along with the baseline of process characteristics (extent to which KPAs 
are satisfied). In addition to establishing the existing strengths and deficiencies of 
processes (via process gap analysis) one must generate a baseline or profile of the 
product characteristics. This information is the first step toward producing 
quantifiable information of the environment and is used to track impacts of process 
changes as well as to produce engineering models of the environment. Information 
for this baseline is collected from existing measurement data, surveys, project 
archives, interviews, and any other source of data that may provide some insight into 
the overall product profile. 

• A measurement program is a requirement at the start of the overall improvement 
program. Some models imply that a mature measurement program may not be a 
critical element of early stages of an improvement effort, but the concept of operating 
as a Level 5 requires that a measurement program be established immediately. The 
measurement program is required for 3 specific reasons: (1) to establish models of the 
environment, (2) to manage projects, and (3) to guide change. An example of basic 
models generated early in the improvement program is depicted in Figure 4-1 The 
early data from SEAS was used to produce these models which in turn are used by 
managers and by process engineers. Such models can be generated very early in the 
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program and then may be continually refined as improved measurement data is 
collected. 


NASA Center Software Product Characteristics 
(Cost Distribution) 


Defect by Error Class (PCS) 


By Support Activity 



Project 

^Management 


By Development Activity 


11 % 

Requirement 



Size of Change Vs. Effort in Maintenance 



0 5000 10000 15000 20000 

Total SLOC Added, Changed, or Deleted in Release 

Effort = (0,36 * SLOC) + 1040 R**2 = .75 a Enhancement Releases 

Mixed Releases 
* Error Correction Releases 
— Linear (Enhancement Releases) 


Figure 4.1 Sample Engineering Models of Process 


• Both technical and management activities should be part of the improvement activity- 
as opposed to management only. Not only are process attributes important to the 
improvement program, but the selection and understanding of changing technical 
activities must be integrated into the program. This implies the continual infusion, 
tailoring and measuring of technical changes. 


Recommendation 2: Set Specific Incremental Gates 

Although the improvement program is viewed as a continuous, sustained program that 
has no completion criteria, incremental check points for the organization were a tool that 
accelerated the improvement efforts and acted as a catalyst for the program. These check 
points were most effective when they were performed by external reviewers; specifically 
SCE teams or ISO teams. 


In the period June 1994 through November 1998, seven independent reviews were 
conducted. Obviously one has to be cautious of overtaxing the development and project 
organizations by requiring excessive time in participating in reviews, but the periodic 
reviews do act as a vital tool in assuring that all persoimel are reviewing their adherence 
to processes and their awareness of the overall plans and goals of the organization. 

Internal audits should be part of any organization’s process program, but they do not 
replace the value of the reviews carried out by an independent, external team. 


17 






For the SEAS organization (about 850 persons) there were formal reviews occurring 
approximately every 6 months, sometimes more frequently. ISO surveillance audits 
occur each 6 months and the external CMM assessments occurred approximately yearly. 

Recommendation 3: Adopt the Concept of ‘Separation of Concerns’ 

Another critical element of a successful improvement program is that of organization. 

Not only must there be strong support from senior management, but there must be a 
designated process improvement organization whose responsibilities include expertise in 
process models, CMM, ISO, process improvement concepts, measurement and available 
assets within the organization. With one organization focusing on the concepts of 
process improvement and focusing on the generation of Program-level assets to be used 
by projects, then projects can focus on the task of producing systems and software. 

In an ‘Experience Factory’ (Reference 8), one organization (PEO) is responsible for 
driving process improvement while the other organizations (projects) focus on the task of 
producing a quality product. It is not necessary that a project organization become expert 
in process models; it is only necessary that they work with the process organization in 
sharing information and adopting processes and assets made available to them. 

The ‘separation of concerns’ concept implies that the project personnel are experts in 
producing systems and the process organizations are experts in process improvement and 
associated activities. There is no need to train project personnel in the details of process 
models such as CMM or ISO, it is only necessary they understand, and apply the process 
assets provided by the process organization. 

Recommendation 4: Depioy Processes to Projects 

One of the most effective steps in attaining process maturity was found to be that of 
having the process engineers work directly with the projects in helping to define, apply 
and understand appropriate processes for their particular project. This activity is in 
contrast to that of having the process staff work on writing, refining, tailoring, enhancing 
written processes. The effort put forth in working directly with projects will be much 
more effective than generating additional written standards. 

Obviously there must be a written foundation describing the processes that are to be 
applied in the organization, but our experiences indicated that occasionally excessive 
effort is put forth in developing and refining written processes. The means by which the 
process engineers accelerate the ‘deployment’ of the appropriate processes is through the 
activity of ‘shepherding’ where process and quality engineers become experts in the 
organization’s baseline, then they provide services to the projects in explaining just how 
to tailor, implement, and sustain relevant processes on their projects. 

In addition to the shepherding activity, the process engineers should adopt the idea of 
scheduling periodic (weekly on SEAS) ‘Process Deployment Team’ meetings where a 1- 
hour discussion of process implications and use is presented. All managers of the 
organization are invited and the process engineers lead a discussion of a process topic; for 
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example ‘How is the Quantitative Process Management KPA applied on a project in this 
domain?’ or ‘What engineering models of the environment exist for our use and how do 
we use them’?’ 

It is the responsibility of the process engineers (SEPG in CMM terminology) along with 
the Quality Assurance office to provide services to the project organizations by 
identifying appropriate assets for the projects and to help them apply these assets; without 
burdening the projects with undue overhead. 

Recommendation 5: Measure Improvement by Product Not by Process 

There is the commonly accepted belief that the quality of the software product generated 
is directly affected by the processes used to generate the product. For that reason, 
organizations implementing a process improvement program, in reality are targeting to 
favorably impact the end products generated by the development. They are anticipating 
improvement measured by product measures, ie., cost, defect rates, cycle time, accurate 
estimation, etc. 

Although this is an obvious and simple concept, organizations occasionally overlook the 
importance of continually tracking the end product to verify that improvements in process 
are meeting the goals of improving the product. Too often, we measure success as the 
attainment of certain CMM levels, or ISO registration or producing more extensive 
processes. Measuring and tracking the product change is often overlooked. Although it 
is very difficult to measure trends in products over a long period of time, the exercise of 
establishing goals, defining measures, and capturing the starting point of these measures 
is valuable in itself It provides the discipline of understanding the projects and 
understanding the environment through the generation of models, goals, and applied 
measurement. 

Senior managers as well as clients often pose the challenge of proving the worth of the 
process improvement program. Instead of arguing that these people ‘ . . .just don’t 
understand the value of process. . . ’, the process organization must be prepared to respond 
to such challenges with specific measures that represent the product; not only the process. 
The questions are very appropriate questions and the measurement program must 
concentrate on continually capturing product attributes so that such questions can be 
addressed; even when the results may not show the expected benefits of the program. 

Recommendation 6: Allocate Appropriate Resources 

The activity of process improvement as well as process in general, requires effort. 
Although the goal is to have the process improvement activity produce a greater return on 
investment than the cost of the investment, the overall activity still requires a sustained 
effort. It is recommended that any organization identify the level of resources that it will 
commit to sustain the processes and process improvement program, then adhere to that 
commitment as it would with any project. It is a mistake to assume that this activity can 
be absorbed as ‘no cost’ by merely requesting that project personnel devote several hours 
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per week on the activity and that specific resources do not have to be allocated. From the 
experiences at SEAS, this approach will not adequately support the process program. 

Based on nearly 8 years of experiences with varying size of organization, it was found 
that the typical allocation of resources for the process program was approximately 1% to 
1 .25% of the size of the entire organization. This effort is in addition to the specific 
project activities that will require additional resources. It also is recommended that the 
Quality Assurance activities allocate from 1.25% up to 2% of the organization that it is 
supporting. 

Table 4-1 shows the relative cost of the process activities for different size organizations. 
The data is based on direct experiences of SEAS over the 8 year period. 


■ Requires .8% to 1 .3% for process improvement activity 

■ Quality Assurance requires from 1% to 1.5% 

■ Spend 2 to 3 times more effort deploying versus writing 
processes 


Program Size 

0-20% Software 

20-40% Software 

40% Up 

70-150 

1,5 FTE 

2.0 

2.5 

150'i|00 

2.0 -2.5 

2.5 - 4.0 

3.0 -1.5 

400 - 900 

3.0 - 4.0 

3.5 -4.5 

4.5 -|,0 

900-1700 

3.0-5 0 

4.0 - 6.0 

5.0 -7.0 


Table 4.1 Allocate Appropriate Resources (Based on SEAS History) 

Recommendation 7; Produce 3 Specific Documents Eariy 

There are numerous activities that must be addressed when an organization initiates a 
process improvement program and there are several products that also must be 
considered. Based on the SEAS experiences, it is recommended that 3 specific 
documents be produced or at least planned when the process program is established. 

The 3 documents include: (1) Quality Management System (QMS) document, (2) process 
improvement plan, and (3) profile of the organization. 
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1 . The QMS is a required document of ISO-9001 and has proved to be an extremely 
valuable handbook for SEAS as well as other organizations who have produced such 
a document. It has been used as an orientation guide for new employees and is a 
valuable reference for all persoimel in characterizing the business operations of the 
program. It is recommended that the document capture: 

• Description of the organization and the staff (roles and responsibilities) 

• Description of the processes in place including their application. 

Standards, policies, methodologies, handbooks and general guidance. 

• Overall process planning (measurement program and process improvement 
program) 

• Description of how the organization complies with required benchmarks (ISO, 
CMM, SA-CMM, etc.) 

2. The Process Improvement Plan (PIP) describes the goals, responsibilities, and 
approach to attaining the improvement goals. It adds the structure of a project to the 
activity with schedules, milestones, and most importantly- specific goals. The goals 
should include product as well as process goals. 

3. The ‘Profile’ of the organization captures the general state of process usage by 
carrying out some type of gap analysis, but the bulk of the document should contain 
the product characteristics. This is the first step toward the goal of engineering 
software by producing quantifiable information. Sample recommended product 
information includes: 

• Amount of software in development and in maintenance 

• Distribution of effort across the life-cycle phases 

• Typical staffing profiles 

• Defect characteristics (number, type, severity) 

• Testing profiles 

• Maintenance costs/ per size of unit 

• Typical software cycle times (time to develop per size, time to make changes) 

• Variance in initial estimates vs. final actuals (size, cost, schedules) 
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Section 5 Conclusion 


Over a 5-year period, the CSC SEAS Center carried out an aggressive process 
improvement program that resulted in an optimizing culture throughout the organization. 
The CMM Level 5 rating, achieved November of 1998, verified the success. 

Focusing the success of the process improvement program on specific product goals, and 
using the compliance with industry benchmarks as a tool has helped make process 
improvement part of the SEAS culture. The QIP of the Software Engineering Laboratory 
(SEE) was used as the model for improvement and other industry benchmarks served as 
tools in achieving documented product goals. This paper describes aspects of the process 
improvement program that were key factors to the successful achievement of the CMM 
Level 5 rating. 

The value of the investment made in process improvement was shown to be significant 
for the overall operations of the Center as well as the business opportunities. The 
quantitative value on product improvement was shown to be very difficult to determine 
and no conclusions could be made there. 

As a result of the five years of activity, the SEAS Center produced seven 
recommendations that any organization should follow in implementing a process 
improvement program. These recommendations focus on building a culture of continuous 
change and improvement throughout an organization. 
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Abstract. 

This paper describes the development of an 
experience factory in an Australian organization. 
Information stmctures were well developed and 
used in the daily work of the organization. This 
included the use of network technology as well as 
the personal interaction between department 
members. Highly motivated personnel drove 
improvement via new techniques, knowledge, and 
tools, A special focus existed to simplify work 
tasks through tool support. Daily work and 

problem solving was strongly based on persomiel 
interaction and access to knowledge bases 
(documentation, mail lists, etc,). The goal of the 
project was to package personnel experience and 
Ijest practices and provide an effective framework 
for access and integration. The system was 

decommissioned shortly after the completion of the 
project. The reasons for this are discussed, 

1. Introduction 

Faced with improvement needs, in 1998 the 
company started to put special attention on 
approaches to support improvement activities in a 
stmctured way. Like many organizations in the 
software industry, improvement aspects and 
strategy issues ranged from product quality and 
project management to the overall improvement of 
software engineering skills. Further local 
improvement aspects had been identified in 
software process assessment using the CMM 
(Capability Maturity Model from the SEI [1]) and 
the ISO 9001 standard. 

At the end of 1998, a project was started in co- 
operation with The Centre for Advanced Empirical 
Software Research (CAESAR) to evaluate the 
Experience Factory (EF) / Quality Improvement 
Paradigm (QIP) [2] concept. The concept was to 
be evaluated as an approach to support local 
improvement activities and to be applied as an 
approach in the given environment at the R&D 
department. The aim was to find a suitable 
approach within six months and to start realization 
of benefits as early as possible. 


The choice of the EF / QIP concept was motivated 
by several aspects. Firstly, it was seen to be a 
promising concept that had been the subject of 
research projects in the past such as PERFECT 
(ESPRIT III project, sponsored by the CEC [3]). 
Secondly, the concept had already been applied in 
other organizations such as the Software 
Engineering Laboratory at NASA [4] and Daimler 
Benz AG [5], Thirdly, the EF/QIP concept reflects 
the state-of-the-art in the field of improvement 
approaches, and therefore is of interest to the 
company. 

The focus of the project was guided by five 
questions. 

(1) Wliere has the EF concept already been 
applied, and what have been the experiences 
with it? 

(2) Wliat are the important characteristics of the 
company’s environment, and of the company’s 
philosophy, which need special consideration? 

(3) Is the EF approach applicable considering the 
environment specifics in the organization. 

(4) If (3) is tme: How has the EF approach to be 
tailored so that it fits the needs and 
characteristics of the organization? 

(5) If (3) is false: How can an organization- 
specific approach be developed which 
considers EF principles? 

Principles of the classical EF approach 

The EF approach describes an organizational 
framework, which addresses the issues of product 
and process improvement in software development 
organizations by providing an environment for 
continuous improvement. The EF approach defines 
an environment for controlled experimentation, 
knowledge reuse, experience packaging, and 
analysis of the development processes. The 
improvement environment consists of two parts: 
the project organization (PO) and the experience 
factory organization (EFO). Each of these follows 



distinct steps in the Quality Improvement Paradigm 
(QIP). The project organization's major aim is to 
deliver software products according to given 
requirements. The PO uses information to improve, 
say, the product quality, the project performance or 
the reliability of project planning. 

The Quality Improvement Paradigm 

The QIP is the main driving force for continuous 
improvement and is integrated in both the PO and 
the EFO. It is defined as consisting of six steps [6]; 

1. Characterize the current project and its 
environment with respect to existing models 
and metrics. 

2. Set the quantifiable goals for successful 
project performance and improvement based 
on the first step and the business and project 
specific goals. 

3. Choose the appropriate process model and 
supporting methods and tools for the project 
and define a project plan, which considers the 
decisions and definitions made in steps 1 and 
2 . 

4. Execute the process, constmct the products, 
collect and validate the data, and analyze it to 
provide real time feedback. 

5. Analyze the data and evaluate the current 
practices, determine problems, record findings, 
and make recommendations for future project 
improvements. 

6. Package the experience in the form of updated 
and refined models and other forms of 
structured knowledge gained from tliis project. 
Save it in an Experience Base to be reused in 
future projects. 

The PO interacts during the project with the EF 
organization (EFO). The EFO supports it with 
knowledge and experience gained in the past and 
provides feedback about the performance and 
quality of the current project while analyzing the 
data provided. The task of the EFO, besides 
support during the software life cycle, is to package 
experience gained during projects in a reusable 
form and to store it in an Experience Base (EB). 

The interacting PO and EFO realize two feedback 
loops, a project feedback loop that takes place in 
the execution phase (support & analysis), and an 
organizational feedback loop that takes place after 
a project is completed (analysis & packaging). The 
second feedback loop changes or improves the 
organization’s understanding of software 
development by packaging and reusing experience 
and making it accessible to future projects. 


How to build and run an EF 

To start an EF there are two possible approaches: a 
top-down or a bottom-up approach. That is 
proceeding from defining processes, stmctures, 
products, and responsibilities to collecting concrete 
experience data, or else collecting data and 
proceeding back up a similar hierarchy. Basili and 
McGarry [7] propose a top-down approach, which 
aims to define and establish the required elements 
before the improvement activities and the data 
collection takes place. This provides a guiding, and 
more or less stable stmcture and the time to focus 
on analysis of results and products rather than on 
integrating changes in the stmcture while working 
with them. Five key steps characterize the 
described top-down approach: (1) Obtain 

commitment, (2) Establish stmcture (3) Establish 
processes (4) Produce baseline (5) Identify 

potential changes. 

The EF at the SEE-NASA 

The Software Engineering Laboratory (SEE) was 
started m 1976 at the NASA / GSFC comprising 
tliree organizations: NASA / GSFC Flight 

Dynamics Division, University of Maryland 

(Department of Computer Science), and the 
Computer Science Corporation (Flight Dynamics 
Tecluiology Group). Its goal was to understand and 
improve the software development process and 
products within the GSFC Flight Dynamics 
Division. In tins environment the EF concept was 
developed and first published in 1985 by V. Basili 
(with a later version in [2]) as a concept based on 
the research and experience of the SEE. Since then 
the EF has been successfully applied in the NASA 
environment and used in more than one hundred 
projects dealmg with different improvement issues 
and teclmologies. The experiences range from 
detected impacts tlirough the use of EF on product 
and process attributes, to recommendations as to 
what to consider when establishing an EF. 

The EF at the Daimler Benz AG 

Software plays a major role in the product range at 
Daimler-Benz. Outside of the SEE, the Daimler 
Benz experience is the only other report directly 
related to the establishment of an EF in a practical 
development environment. Furthermore, they 
describe their experiences in the first year of the EF 
project, which was significant to our need to 
establish benefits in a short time period. Three 
separate projects formed the basis of analysis. 
Project A was in the aerospace domain with mainly 
in-house software development of large embedded 
systems and rigid real-time constraints. A 
measurement program had aheady been 
commenced. The goals were to make improvement 
efforts persistent and repeatable, project effort 
predictable, and to support technical reviews. The 



initiative comprised two application projects and 2- 
3 people were concerned with EF activities. Project 
B involved small-embedded systems. The 
development changed from contractors to in-house 
in recent years. The goal was to build core 
competencies and clarify development questions 
such as how to keep software portable, and how to 
make sure that each planned function was 
implemented. Review techniques were identified as 
potential support for this. The initiative comprised 
1-2 application projects and 2-8 people were 
concerned with EF activities. Project C dealt with 
large administrative software units for managing 
internal business processes. Software requirements 
were defined in-house, but the development was 
outsourced. The focus for the EF was quality 
assurance, especially in outsourced development. 
In this case the initiative comprised 3 application 
projects and 2-3 people were concerned with EF 
activities. 

For projects A and B the company followed the 
top-down approach discussed above. The 
measurement of the baseline started several months 
after the EF initiative. This first stage consisted of 
the definition of essential EF stmctures, processes, 
roles, and products. They also decided in project B 
to assist technical reviews and collect related data 
to help solve current problems. This was done 
without defining stmctures, and is therefore seen as 
a bottom-up activity. 

For project C, they decided to collect potentially 
useful data immediately after the definition of 
fundamental goals using a one-day workshop. The 
EF elements like processes, tasks, and product 
stmcture were only defined when demand for that 
occurred. This characterizes an evolutionary 
approach and is seen by the authors as a bottom-up 
approach. The main reasons to follow this approach 
were: 

(1) “The immature practices needed to be 
improved rather quickly, but they did not 
require highly sophisticated analysis 
techniques or experience stmcture 
documents. 

(2) Stmctures would not be stable anyway. 

(3) People were the bottleneck. Effort needed to 
be concentrated on content first.” [5] 

The choice between a top-down or bottom-up 
approach was ftuther influenced by the opinion that 
stable and mature stmctures are needed for a top- 
down approach. 

The experiences to date which were of most 
interest to this project were: 

(1) Pros & cons of a top-down approach: The 
definition of the EF elements in the top-down 
approach makes it easier for the EF participants to 
recognize the existence of the EF but provides less 


concrete early benefits for them. The approach can 
not be performed without a close connection 
between the EF and the processes that are in place. 

(2) Pros & cons of a bottom-up approach: It 
may enable a swift realization of the EF results. 
Results are visible in a short time, but this effect 
cannot be planned and it is often hard to prove the 
usefulness beforehand, making visibility of the EF 
benefits more difficult. 

(3) There are many sources of reusable 
experiences and measurement is just one of them, 
e.g., intermediate products (like a QA plan) are 
often seen to be more useful for reuse than concrete 
experience packages, even when their impact has 
not been analyzed. 

(4) There were no problems in handling and 
stmcturing the data. Collecting data and qualitative 
experiences were the bottleneck. 

The EF in the PERFECT project 

The PERFECT project is an ESPRIT III project 
funded by the Central European Commission 
(CEC) and started in the early 90's. Organizations 
like Daimler Benz, Siemens, Q-Labs / Ericsson, 
and the University of Kaiserslautern / Fraunhofer 
lESE came together with the aim to find a more 
detailed and tailored approach for the introduction 
of an EF into organizations. The benefits seen for 
the approach used include explicit goal setting, 
focus on products, establishment of a separate 
organization driving the improvement program, 
and the tailoring of the activities to specific needs. 
This is a realization of the principles stated in the 
EF concept [2]. 

2. Establishing the EF Goals and 
Methods 

The following points were seen as important in 
establishmg the strategy that would be adopted in 
the organization. 

• Arguments exist that the EF assumes a stable 
environment, but that this is not suitable for all 
companies. Their environments may be too 
dynamic because of short technology cycles. Some 
organizations argue that stable stmctures might 
hinder progress and innovation. 

• The time aspect is a critical point. The time 
frame for first results seems long when following 
the top-down approach. This can cause problems 
maintaining participant motivation and 
management commitment. 

• The present EF / QIP approach remains a 
general, abstract framework, which lacks explicit 
implementation guidelines and detailed experience 
reports which are needed in industry. The data that 
is available at the moment is either experimental or 
based on a long-term application. 

• The EF originated from a scientific and 
government environment at SEL-NASA and 



proved suitable after long-term application. Are the 
results transferable to software companies in 
general? 

• The bottom-up approach trialled at Daimler- 
Benz seemed to work as did the top-down 
approach. There is no detailed data for a 
comparison of the results of the two approaches. 
The bottom-up approach brought earlier results. Is 
a bottom-up approach the better alternative when 
preliminary processes and an understanding of the 
environment already exist? 

• The EF / QIP approach requires a high degree 
of experimentation to evaluate techniques. Some 
companies, especially large ones with R&D 
departments, have the resources to do that but is it 
feasible in smaller companies? Often improvement 
decisions and technology adoptions have to be 
made much faster than is possible by using pilot 
projects. 

• It is not completely clear whether 
improvements achieved related to things like reuse 
and productivity, have their root cause in the 
introduction of EF concepts or in the successful 
application of technology. Would the switch to 
promising techniques such as 00 without the 
introduction of EF have had the same effect? 
Setting the project goals 

Based on this analysis it was decided that; “The 
project aim is to develop tools and tecliniques to 
improve the speed and quality of software 
development and to enhance the transfer of process 
knowledge between projects and project groups.” 
In the organization there were six improvement 
initiatives present: (1) process tailoring, (2) CMM 
and ISO 900x assessments, (3) personnel skill 
improvement, (4) company improvement strategy, 
(5) self motivated tool development and tool 
integration (innovative spirit), and (6) the 
measurement program. 

Thus several improvement activities were already 
present and action plans defined from the results. 
What the organization needed was a framework to 
support and focus the related actions. Process 
tailoring and definition existed and were already 
applied in parts of the department. Further action 
was needed to spread them out across the whole 
department and reuse experience gained during the 
initial implementations. It was not the main goal to 
achieve a state such as CMM level-5, which was 
seen as a hinderence to the company philosophy 
which was to establish an environment which is 
reliable and repeatable but not an overly defined 
one. In the organization the developers initiate a 
great part of the improvement activities. They 
identify problems and possible solutions, take 
ownership and develop solutions in the form of 
tools or work instmctions. This was to be supported 
and recognized. The present personnel skill 
improvement activities were to be supported as 


well as the team spirit and the overall interaction / 
communication. It was viewed that stable and fixed 
structures tend to hinder that. The company 
strategy and goals had been broken down into 
improvement activities at the project and 
development level. The project needed to focus 
and refine these (GQM). The existing 
measurement program was showing promising 
results and indicating new improvement items. 

It was determined that a bottom-up approach could 
build on current measurement and initially defined 
processes and could immediately deliver data 
associated with known improvement issues. It 
would also give incentive to the desired tool 
development. Next we set out to determine 
whether environmental conditions would also 
support a bottom-up approach. The situation in 
each of the project teams is significantly different 
with respect to techniques and tools deployed. 
There was an identified need to identily best 
practices, to document experiences with them, and 
to support the transfer of knowledge between 
project teams. It was obvious that the concept of 
the experience base could help. The information 
access environment was focused on network 
teclmology. Every project team had an internal / 
external homepage to spread information, they had 
a project server with related documents, a central 
mail and document repository existed to get 
information around and to document daily 
experiences. Documents templates give the 
information an identical stmcture to improve 
readability and to ensure consistency of data. The 
mailing and posting repository (Microsoft Outlook) 
had proved its usefulness in recent years by giving 
a basis for discussions and to disseminate 
information. Motivated by the general 
improvement spirit in the organization, the usage 
frequency of this repository was fairly high. It was 
possible to consider using this already-existing, 
documented experience for an Experience Base 
(EB). But what was still needed was an effective 
access technique for the information stored, e.g., a 
search engine. 

Both the normal daily work and solution seeking 
resulted in high interaction between department 
members. People were identified as having special 
knowledge regarding different development fields 
and the general attitude was to provide others with 
this knowledge when needed. From unstmctured 
interviews with team members it was identified that 
it would be useful to package the experiences 
(daily work knowledge). Initial examinations of the 
amount of already documented knowledge in 
reports and mail archives showed that a basic 
knowledge & experience base aheady existed on 
the Intranet but was not yet efficiently usable. 
Because of the lean hierarchy in the department. 



self-motivated improvement activities and the 
integration of developer opinions was encouraged 
and simplified. This leads to an environment that is 
driven dynamically by the team members. 

To summarize, information stmctures were well 
developed and used in the daily work. This 
includes the use of network technology as well as 
the interaction between department members. 
Highly motivated personnel drive improvement via 
new techniques, knowledge, and tools. A special 
focus existed to simplify work tasks through tool 
support. The daily work and problem solving was 
strongly based on personnel interaction and access 
to knowledge bases (documentation, mail lists, 
etc.). The goal therefore had to be to package 
personnel experience and best practices and 
provide an effective framework for access and 
integration. 

From these findings we were convinced that the 
organization should establish an improvement 
environment based on the EF concept, but that the 
appropriate approach was bottom-up. 

We defined the EF concept for the organization 
based on five steps: 

Step 1 collect experience and knowledge, 

Step 2 publish the experience documents and 
provides an access framework. 

Step 3 integrate experience in an environment were 
it is needed. 

Step 4 analyze how the experience repository is 
used, and 

Step 5 extend the stmctures of the improvement 
environment when the need occurs. 

What is different to the classic EF approach & 
concept? 

The main difference to the EF/QIP concept 
described in [2] and the concept we describe is in 
the overall philosophy. First we favored a bottom- 
up approach starting with providing useable 
experience from the beginning rather than spending 
time defining processes and stmctures for a top- 
down approach. Moreover, our approach places 
knowledge management and integration in the 
center to serve as a driving force for continuous 
improvement. This is quite different to the EF, 
which uses the QIP [2] and the GQM [8] as driving 
forces. 

One main concept of the classical Experience 
Factory is experience generation and explicit 
experimentation with new technologies to evaluate 
them and to measure their impact on product and 
process characteristics. Our approach goes away 
from explicit experience generation, and focuses on 
gathering existing experience and supplementing it 
as it grows using access technology. Furthermore 


it is not based on the principle of gathering 
experience from experimentation. Rather the 
approach uses experience gained with software 
engineering techniques and new software 
technologies in the daily work rather than explicitly 
experimenting with new things. Experience transfer 
supports the growth of the experience inherent in 
the environment. 

Another difference is that our approach describes 
how to start the implementation of the first cycle 
(gathering of existing experience). Our approach 
allows both the improvement stmctures and the 
development environment mature over time. As in 
the EF framework our experience management 
environment (EME) supports the documentation 
and storage of every-day experience. Further more 
both approaches give stmcture to establish a 
contmuous improvement environment. The EME is 
seen to be more evolutionary and able to be 
adjusted to special needs. The EF gives a 
predefined stmcture to be established and therefore 
changes the existing way to do things. 

Requirements for application 

Due to the fact that this approach was motivated by 
environmental characteristics, there exist certain 
requirements for the application. If another 
organization intended to apply our approach it 
should check the following characteristics, which 
we see as minimal requirements. 

• A highly used and developed network 
enviromiient has to be present and integrated in the 
daily processes. 

• Infomiation repository stmctures need to be 
present in the environment, i.e. an Intranet stmcture 
using mail archives, project servers, document 
servers, etc. is needed. 

• At least initial processes have to exist, which 
define when certain information has to be 
documented, e.g., meeting notes. 

• For the documentation style, corporate 
templates should exist, which give information a 
common stmcture. 

• Activities have to exist which serve to identify 
improvement needs outside the knowledge 
management focus, e.g. CMM assessment. 

• There has to be a conviction that there exists a 
high amount of aheady documented experience and 
knowledge in the environment. The document 
could emphasize things like process 
documentation, reports, mail archives, web pages, 
etc. 

• The staff have to be self-motivated to search 
for experience or knowledge. 

• The staff have to be self-motivated to 
experiment with new technologies for the 
improvement of products and their skills. The 



company philosophy should support this by 
encouraging the staff to do so, e.g., planing time for 
that and recognizing those activities and the results. 

• The organization must have an attitude to let 
the developers drive changes influenced by strategy 
and improvement goals. What this also assumes is 
that there exists a company thinking rather than an 
individual focus. 

• An organization needs resources to establish 
the concept framework and to maintain it while it 
matures and grows. 

• Initial process and project environment 
definitions should be available, which build a 
context for experiences and which can serve as 
success story examples. At least one project 
environment should exist, which has documented 
experience with the introduction of a defined 
development process. 

• The project management staff should be open 
to constmctive suggestion concerning 
improvements to their development processes. 

3. The organizational solntion 

The project was initiated by a senior manager with 
a reputation as a successful champion. Staff were 
involved in all aspects of the initial concept design 
and subsequent implementation. A project manager 
from the organization was assigned the task of 
overseeing the experience factory project. Other 
staff were involved via seminars, individual 
consultations and an experience factory website 
established as a result of requests from the first 


staff seminar. Thus it was with confidence that we 
embarked on the technical design and 
implementation of the experience factory in the 
organization. 

Since we were convinced that the environment 
already contained a significant amount of 
documented experience (the mail archive contained 
around 8500 documents six months after 
commencement), we began by finding an 
appropriate technology to gather this experience 
and make it searchable. Using the tool we selected 
(Microsoft Site Server 3.0) we also had a 
framework to make the gathered experience 
accessible via a web site. Therefore we created a 
separate web page providmg the interface to the 
indexes. This also provided the integration step 
into their daily work. When someone wanted to 
find existing experience about a task or general 
information from the environment they could now 
do that using the web page. 

Evaluation of indexing tools 

The first step in the evaluation was to define the 
requirements for an appropriate indexing tool. 
These were separated into two kinds of 
requirements. We defined requirements for a 
surface-evaluation, i.e. an initial evaluation to 
check basic functionality, and when a tool passed 
these we evaluated it against further interface and 
behavior related requirements. In Table 1 they are 
marked as RSx (surface) and RDx (detailed) 
requirements: 


Table 1. Requirements 


Requirement 

Description 

RSI 

Price', the price of the tool shall be reasonable, preferably freeware. 

RS2 

File types', the tool shall be able to create an index of common files including 
Microsoft Office documents, HTML, PDF, MS exchange files. 

RS3 

Interface type', the tool shall provide a web-based interface to apply queries on 
the document index to find appropriate information. It shall at least be possible 
to redirect the query input and output from a web site to the tool and vice versa. 

RS4 

Scalability', the tool shall be scalable, i.e. the amount of indexed documents and 
users should not be limited. 

RS5 

Gathering', the tool shall be able to gather documents over a Microsoft NT 
network. Tools running on a Unix machine but able to access NT would also 
fulfill this requirement. 

RDl 

Performance', the tool shall be reasonable fast, e.g., search queries shall be 
answered in less than a minute, re-indexing shall be possible over a weekend. 

RD2 

Maintenance', the tool shall provide a mechanism to automatically update the 
search base (scheduled builds). 

RD3 

Interface style: it shall be possible to provide the user with a short description of 
query matching documents and to modify the style of the interface. 

RD4 

Access rights: The tool shall be able to give a user only access to those files to 
which he has access over the NT network. 




Based on these results we decided after three weeks 
of tool evaluation to use Microsoft Site Server 3.0 
(MSS) as the tool to gather and publish our 
environment, experience & knowledge documents, 
to build the base for our experience management 
environment (EME). The network environment 
(Intranet) was a Microsoft Windows NT network. 


Clients were running Windows 98, Windows NT, 
or Windows NT Server. Furthermore a couple of 
servers running Unix are connected and their file 
system can be accessed over the NT network from 
other non-Unix machines. The document sources 
are summarized in Table 2. 


Table 2. Document Sources 


Document sonrce 

File types 

Information type 

Mail Exchange Server 

Mail format (exch) 

Folders for past and current 
project information, technology 
discussions, reuse items, etc. 

Project and department web 
server 

HTML, Microsoft Office 
documents (DOC, XLS, PPT), 
Adobe Acrobat PDF, database files 
(SQL, Access) 

Project documents ranging from 
code to process descriptions, 
general department information 
like administration tasks 

Local workstations 

Microsoft Office documents 
(DOC, XLS, PPT), Adobe Acrobat 
PDF, HTML, plain text (TXT) 

Documents gathered for own 
information purpose, document 
drafts 


4. Analysis of Usage 

The analysis step consisted of preliminary analysis 
of the access log data to the web. The tool provided 
us with the functionality to create usage reports. In 
addition we conducted a survey on the benefits the 
people observed while using it. In three months we 
were able to implement the stmctures for the four 
steps ‘collect’, ‘publish’, ‘integrate’, and ‘analyze’. 
We were able to identify items for extension 
activities (step ‘extend’) from these results and 
from user feedback, which helped to focus on the 
future. 



The growth of our document search repository over 
time was influenced by three factors: 

• including more document sources in a specific 
catalog, 

• including more types of documents in a catalog 
definition, and 

• the growth of experience & knowledge 
documents in the environment over time. 

The number of documents changed with every 
build cycle for the catalogs. Numbers of gathered 
documents in the search repository have been 
documented and are shown in Figure 1, together 
with the factor which mainly influenced the 
growth. Here we see a significant growth in 
documents available after a relatively short period 
of time. 


Figure 1. Growth of Documents in the Search 
Repository 

Figure 2 shows the growth of the mail repository 
which results largely from daily work. The figure 
shows that in the last four weeks of the project the 
growth in the mail repository was 1,800 new 
documents. Tliis is not to say that every added 
document is indeed useful as a reusable experience, 
but it indicates that daily work items were 
documented and shared. 















Figure 2 Growth of documents in the 
mail repository 

In figure 3 we show the use of the repository over a 
seven week period. This figure shows that, in the 
early stages, people became more and more aware 
of the repository and more people tested the 
repository with their personal information needs 
(peak in the third bar). After that the usage 
frequency was lower, more stable and continuous. 



Figure 3. Usage of the Search Repository 
per week 


indicating possible acceptance. In this figure, a 
visit is defined as a series of consecutive requests 
from a user to an Internet site and a request is a 
successful connection to an Internet site, i.e. 
retrieving contents. The graph shows the 
distribution of the number of different users 
visiting the web site as a percent of total visits over 
the period. 

Figure 4 shows the average number of queries 
entered per visit per week. The number is low at 
the beginning. People were testing the repository 
with an average of one query, presumably to see 
the behavior and the functionality. Later the people 
seem to search more seriously for information. 
Further information provided to the department 
about the intent and use of the search repository 
probably caused the high increase at the end. The 


Figure 4. Average search queries per 
visit 

combination of figures 3 and 4 is interesting. It 
shows that number of visits seems to be stabilizing 
but that the number of queries per visit is 
increasing. This demonstrates a relatively efficient 
usage pattern. 

The more popular search queries entered during the 
last 8 weeks of the project were basically a binary 
classification of teclmology issues and process 
issues. The teclmology issues include ActiveX and 
XML. The process issues were classified as 
“process” in general and “estimation process”. The 
data indicates a large diversity of information needs 
in the organizational environment. The repository 
was able to give back possibly useful documents 
for most queries. However we do not have any 
infomiation to indicate whether the returned 
documents were useful or not. 

Some queries did not return documents since the 
repository contained msufficient documents, for 
example new teclmologies like the XML language. 
As with the usage report, we need to be careful 
with the query data because it is only initial data 
from a short period of use. 

The reports, although only initial, provide some 
preliminary indications. 

• The acceptance, i.e. usage of the repository 
was promising. 

• The usage frequency indicates a degree of 
integration into the daily information search 
activities. 

• The information which was needed in the 
department covers a very wide range of areas. 

• Process and new technology information 
seems to be of special interest. 

• Informing people about the presence and the 
usefulness of the concept is important. 

These were the initial conclusions from the limited 
data available. Surveying the repository users then 
extended these. 


o 





Survey about usage beuefits 

To get direct user feedback we decided to conduct 
an informal survey of staff impressions while using 
the search engine, ideas the users had, and the 
benefits realized through being able to search the 
local environment documents. Overall the 
acceptance and judgement of the product was good. 
The feedback ranged from ideas for extension, 
descriptions of how people used the repository, to 
first impressions. The following points capture the 
most common critical aspects, benefits, and 
extension ideas gained from the survey. 

We found that the people who had been working in 
the department for a long time knew where 
information could be found without using the 
repository (e.g. document templates or whom to 
ask to get information). The opportunity for this 
will reduce as the department grows. We would 
then predict that the repository could play a 
stronger role in information transfer. 

The benefits that were noted included comment 
that the search web site is a good address for new 
employees who are not familiar with the work 
environment. People also reported that they found 
documents and information that had been lost. The 
average time saved through this was estimated to 
be in a range of 1 to 4 hours. The search engine 
also reportedly breaks down information barriers 
between projects and environments (sharing 
experience & knowledge). It was seen as a good 
thing to first search for local information and 
experience before proceeding. 


6. Conclusions 

At the end of our implementation of cycle #1 we 
assessed which of our initial expectations for the 
defined approach were met. Earlier we described 
our expectations, which we now examine. The time 
our experience management environment (EME) 
was usable was 8 weeks and hence the underlying 
data has to be viewed carefully and further trends 
have to be monitored to prove the findings. 

Our experience is that we generally achieved the 
technical objectives. In this respect the project was 
successful. We are relatively confident that the 
experience management environment could help 
support improvement in this environment. The 
data that was available at this time was too 
preliminary to justify strong conclusions about 
usage of the experience base. The usage patterns 
indicated a trend towards consistent use and 
integration into the work cycle. The project proved 
the viability of the bottom-up approach selected in 
this organization. Whether this will apply in other 
organizations clearly depends on many factors. We 
have outlined what we believe these factors to be. 


They range from broad organizational and cultural 
characteristics to technology characteristics. The 
most important evidence, we believe, is the clear 
establishment of a substantial experience base in an 
organizational setting in a short time period, which 
showed indications of successful deployment. 

So what went wrong? Surprisingly, given the 
positive comments by the users, the system was 
decommissioned shortly after the completion of the 
project. A major contributor to this was the lack of 
ongoing management commitment to the project. 
While a senior manager was the initial champion of 
the project, its implementation was assigned to a 
busy project leader. In retrospect greater emphasis 
should have been placed on ensuring that the 
project champion maintained a more visible 
presence with respect to the experience factory 
project. A second issue was the lack of 
identification of clear goals and payback criteria for 
the project. It appears that, although technology 
can support this type of experience base 
development, a top down GQM-based 
methodology has the characteristics that are more 
likely to ensure longer-term success. The third 
observation was that the close physical proximity 
of the development teams and the relatively small 
number of persomiel worked against the need for a 
more fomial repository-based experience factory. 
The metrics success factors documented by Jeffery 
and Berry in [9] might provide an indicator of 
factors relevant to EF success as well. For example 
they list senior management commitment, realistic 
assessment of payback, clear responsibilities, 
determination of required granularity among many 
others. The issue of physical proximity has been 
observed by the authors in the context of electronic 
conferencing as a major implementation issue. 
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1 Introduction 

Software is a major expense for most organizations and is on the critical path to almost all organizational 
activities. Individual software development organizations in general strive to develop higher quality 
systems at a lower cost for both their internal and external customers. Yet the processes used to develop 
such software are still very primitive in the way that experience is incorporated. Learning is often from 
scratch, and each new development team has to relearn the mistakes of its predecessors. Reuse of an 
organization’s own products, processes, and experience is becoming more accepted as a feasible solution to 
this problem. But implementation of the idea, in most cases, has not gone beyond reuse of small-scale code 
components in very specific, well-defined, situations. True learning within a software development 
organization requires that organizational experiences, both teclmological and social, be analyzed and 
synthesized so that members of the organization can leam from them and apply them to new problems. 

Suppose, for example, that a member of a software development group is considering the use of a particular 
sof^hvare engineering technology on a forthcoming project. This member has heard that this technology has 
been used successfully in other projects in some other part of the organization, but cannot easily find out 
where or by whom. He or she would like very much to leam from the experiences of those previous 
projects, first to help make the decision to use the teclmology or not, then to help implement the technology 
in the current project. It would be helpful, obviously, to avoid the inevitable mistakes that are made the 
first time a new technology is tried. Also, it would be useful to see the costs of using that technology (e.g. 
the costs of new tools or training) in order to help estimate those costs for the current project. Without the 
organizational infrastmcture to support access to previous experience from within the organization, this 
type of information would be very difficult, if not impossible, for the development team member to get. 

This paper describes a system for supporting experience management in a multinational software 
improvement consultancy called Q-Labs. This Experience Management System (EMS) is based on the 
Experience Factory concept [1] proposed by Basili. This paper focuses on describing the design principles 
behind EMS and reports the results of an evaluation of its mterface. 

2 The Experience Factory 

Basili proposed the Experience Factory as an organizational infrastmcture to produce, store, and reuse 
experiences gained in a software development organization [1,2,3]. The Experience Factory idea organizes 
a software development enterprise into two distinct organizations, each specializing in its own primary 
goals. The Project Organization focuses on delivering the software product and the Experience Factory 
focuses on learning from experience and improving software development practice in the organization. 
Although the roles of the Project Organization and the Experience Factory are separate, they interact to 
support each other’s objectives. As illustrated in Figure 1, the feedback between the two parts of the 
organization flows along well-defined channels for specific purposes. Also, the Experience Factory 



supports the meta process defined by Basili’s Quality Improvement Paradigm (QIP) [6], As shown in 
Figure 1, for each new project: the problem at hand is characterized (1), goals are set (2), a suitable process 
is chosen (3), the process is executed and measured (4), outputs are analyzed (5), and lessons and products 
are packaged and stored in the experience base for future reuse (6). 

Experience Factories recognize that improving software processes and products requires: (1) continual 
accumulation of evaluated and synthesized experiences in experience packages', (2) storage of the 
experience packages in an integrated experience base accessible by different parts of the organization; and 
(3) creation of perspectives by which different parts of the organization can look at the same experience 
base in different ways. Some examples of experience packages might be the results of a study investigating 
competing design techniques, a software library that provides some general functionality, or a set of data on 
the effort expended on several similar projects. 



Figure 1. Experience Factory stmcture 


The Experience Factory concept has been implemented in a number of software development organizations 
that have addressed the above questions in various ways (e.g. [4,5,9]). The Software Engineering 
Laboratory (SEL) [4] is an example of an Experience Factory. The SEL Quality Improvement Paradigm 
provides a practical method for facilitating product-based process improvement within a particular 
organization. Because it directly ties process improvement to the products produced, it allows an 
organization to optimize its process for the type of work that it does. Usmg this approach, the SEL has 
reduced development costs by 60%, decreased error rates by 85%, and reduced cycle time by 20% over the 
past 10 years. Establishing an Experience Factory, however, is a long-temr endeavour requiring a great deal 
of commitment on the part of both management and development staff Implementing an Experience 
Factory involves substantial up-front costs. It requires instilling a new philosophy of learning into an 
organization, establishing an organizational stmcture and processes for the Experience Factory to collect, 
package and share experiences. Once in place, it will also require substantial ongoing effort and 
commitment to maintain itself as an effective agent for continuous software process improvement. 

We believe that emerging computing teclinologies - such as distributed systems, visual query interfaces, 
and intranets - offer great potential to support the establisliment and maintenance of Experience Factories 
in organizations. This paper reports preliminary results and experiments from a research project aimed at 
implementing a system for supporting an Experience Factory within an industrial setting. 

3 The Principles Behind the Experience Management System 

We have found it useful to discuss the problem of software experience capture and reuse, and our approach 
to addressing it, in terms of the 3-layer conceptual view shown in Figure 2. This view shows three aspects 
of the problem, all of which need to be addressed before a complete solution can be implemented. At the 
lowest level, there are issues of how experience should be electronically stored in a repository and made 
accessible across geographical boundaries. The middle level deals with user interface issues, including 
how experiences are best presented to a user and how the user interacts with the automated system to 








manipulate, search, and retrieve experience. At the top level, the organizational issues of how experience 
reuse will fit into the work of the organization, how the experience base will be updated and maintained, 
and how experiences will be analyzed and synthesized over time, are addressed. The bottom two levels of 
Figure 2 define the computer-intensive support pictured in Figure 1. The top level of Figure 2 defines the 
interface between the human-intensive and the computer-intensive areas described in Figure 1 . 
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Figure 2. The three levels of an Experience Management System 


Allied with this conceptual view, we have defined a set of requirements aimed at making the EMS reliable, 
easy to use, and flexible enough to support the Experience Factory concept. 


Rl. The system shall support geographically distributed organizations allowing them to share and 
manage experience packages remotely. 

R2. The repository shall be robust, reliable, and portable to standard computer platforms. 

R3. The user interface level shall be as platform independent as possible. 

R4. The data model shall be simple but powerful enough to model diverse classes of “experience 
packages.” The system will adapt to the current practices, processes, and products of different 
organizations, and not vice-versa. 

R5. The system shall be easy to learn and self explanatory. The user interface shall be easy to use 
and the stored information shall be easy to search and retrieve. 

This conceptual view, along with the requirements, form the basis of several ongoing efforts to implement 
experience management systems m a variety of settings. The first of these efforts, the Q-Labs EMS, is 
described in this paper. Lessons learned from our work with Q-Labs will be fed into other EMS efforts in 
the future. 


4 The Q-Labs EMS 

The Experimental Software Engineering Group (ESEG) at the University of Maryland and Q-Labs, Inc., 
have been working together for nearly three years on a project aimed at building the infrastmcture to 
support a tme Experience Factory within Q-Labs, resulting in an Experience Management System (from 
here on called the “Q-Labs EMS”). Q-Labs is a multi-national software engineering consulting firm that 
specializes in helping its clients improve their software engineering practices by implementing state-of-the- 
art technologies in their software development organizations. Q-Labs has helped many of its clients 
implement some of the principles of the Experience Factory. Q-Labs’ objectives for this project have been 
to provide a “virtual office” for the organization, which is spread across two continents, and to allow each 
Q-Labs consultant to benefit from the experience of every other Q-Labs consultant. 




4.1 System Architecture 

In order to fulfill the first requirement presented in section 3, to support geographically distributed 
organizations, the Q-Labs EMS is a client-server system. The clients enforce the policies defined at the 
procedural level and implement the system front-end applications defined by the user interface level (here 
referring to the levels in Figure 2). The server implements the system repository. The architecture of the 
system is shown in Figure 3. It follows a three-tier model. At the top level, we have the EMS Manager and 
EMS Visual Query Interface (VQI) applications. They work as client applications sending requests to a 
“middle tier” of services. This “EMS Server” receives messages from the client applications and translates 
them into low-level SQL (Standard Query Language) calls to an “EMS Repository.” 



Figure 3. Q-Labs EMS architecture 

In order to fulfill the second requirement, repository robustness and portability, the EMS Repository uses 
standard database technology. It stores all the information necessary for the EMS operation in a relational 
database managed by a commercial DBMS (Data Base Management System.) The liiik between the server 
and the repository is done through standard embedded SQL (PL-SQL.) This makes the repository portable 
to standard commercial DBMS, and virtually portable to any platform. 

In order to fulfill the third requirement, a platform independent user mterface, the client applications are 
implemented in Java™, This makes them portable to any platform that has a Java virtual machine. To date 
we have tested the client applications - EMS Manager and EMS VQI - on Unix, Windows (NT and 98), 
and Macintosh platforms. 

4.2 Data Model 

An early, cmcial task in this project has been identifying the pieces of infonnation that should be packaged 
as experience. However, each organization has different needs and experiences. In order to fulfill the 
fourth requirement, data model simplicity and flexibility, we introduce the concept of a perspective. A 
perspective defines a class of packages much like an object class in an object-oriented system. A 
perspective is defined by three parts: a classification part, a relationship part, and a body part. 

The classification part, called the perspective taxonomy, defines a classification model for the packages 
instantiated from that perspective. The perspectives’ taxonomies describe the contents of an experience 
base in an organization’s own terminology, thus guiding users intuitively towards the experiences of 
interest to them. A taxonomy is composed of attributes with well-defined naming and typing. The 
attributes effectively define the facets that can to be filled by an experience packager to characterize a 
package instantiated in a given perspective. 

The relationship part, called the perspective’s links, defines the relationship between the packages 
instantiated in this perspective and other packages in the experience base. Like attributes, links have a name 
and type associated with them. 




The perspective body defines the elements that compose the experience packages instantiated from this 
perspective. Like attributes and links, elements have names and types. The type is usually a file or a list of 
files. Those files are internally stored in the experience base as large objects when a package is instantiated 
from a perspective. 

4.3 Visual Query Interface 

In order to fulfill the fifth requirement, a search and retrieval interface that is easy to learn and self- 
explanatory, we adopted a visual query interface (VQI) concept. As proposed by Shneiderman [13], visual 
query interfaces let users “fly through” stored information by adjusting widgets and viewing animated 
results in the computer screen. In EMS, they allow easy interactive querying of the repository based on 
various attributes of the experience packages. Built in to the interface is the set of attributes defined for the 
perspective currently being viewed. Figure 5 shows the user interface for the Q-Labs EMS. Upon login a 
user will have a set of perspectives from which he/she can look at stored experience packages. A user will 
fire a VQI by selecting one of those perspectives. The VQI will display the packages that are associated 
with this perspective together with the attributes and query devices (slider bars, check boxes, etc.) used to 
search and browse those packages. The widgets used on the interface are defined on the fly based on the 
data types and number of different values associated with each attribute. 

Using the VQI, the user interactively searches the experience packages associated with a certain 
perspective by manipulating the widgets on the right and observing the number of selected packages on the 
two-dimensional chart. Once a small subset of packages is selected using the VQI query devices, the user 
can quickly examine specific packages by clicking on them. This will fire a Web Page with a complete 
description of the selected package, including its links and elements. If the selected package corresponds to 
the user’s expectations, he/she can click on the desired elements to retrieve the package’s files. 

The VQI has two features that we believe are fundamental to EMS. First, its search is interactive and 
controlled by the user. This allow the user to easily control the number of matches by widening or 
narrowing the search scope with a few mouse clicks. This is a clear advantage over keyword-based search - 
such as those executed by Worldwide Web search engines. We hypothesize that this significantly helps 
users to find packages that are useful to them even when an exact match is not available. The second key 
feature of this type of interface is that it allows people to visualize the amount of stored experience and the 
classification schema used by the organization. We believe that this significantly helps new users to get 
used to EMS and is also an important learning medium for new team members. 



Figure 4. Q-Labs Visual Query Interface (VQI) 









The user interface also has functionality to allow users to submit new experience packages to the 
experience base. This functionality uses the attributes, links, and elements associated with the perspectives 
to produce the forms that a user must complete to describe new packages. 


5 Interface prototype evaluation 

The first of several planned empirical studies to evaluate the Q-Labs EMS prototypes was an evaluation of 
the interface. This initial prototype consisted of the VQI (pictured in Figure 5), a simple data entry interface 
used to submit experience packages (just a form with each field corresponding to one of the defined 
attributes for a given perspective), and a small repository populated with a collection of real Q-Labs 
documents and project descriptions. Two perspectives were also provided with this prototype. The 
documents perspective used attributes of documents (e.g. author, date, title, etc.) as the search mechanisms, 
while the projects perspective used attributes of projects (e.g. project lead, customer, start date, finish date, 
total effort, etc.). Some attributes were common to both perspectives (e.g. technical area). The evaluation 
was carried out at this point in the project (before having a full working system) because it was essential to 
get user feedback on the basic paradigms we had chosen before we proceeded further. 

5.1 Study design 

The interface evaluation study was based on qualitative methods [10]. The importance of such methods in 
validating software engineering technology is discussed by Seaman m [12]. The goals of the interface 
evaluation study were: 

1. To evaluate the current set of attributes (in both the “projects” and “documents” taxonomies) in 
terms of completeness, usefulness, and clarity. 

2. To evaluate the visual query search and retrieval interface in terms of usefulness, usability, and 
appropriateness. 

3. To evaluate the data entry interface in terms of feasibility, usability, and impact on working 
procedures. 

These goals were refined into a set of questions that guided the design of the study. To answer these 
questions, two types of data were collected. The first data source consisted of detailed notes concerning 
how the subjects used the prototype and the comments they made while using it. The second data source 
came from a set of interviews that were conducted at the end of each evaluation session. The questions 
asked during the interviews are shown below in Figure 5. 


1 . What did you like most about the search and retrieval interface? 

2. Was there anything really annoying about using it? 

3 . Was it easy to move around and do things with the mouse and keyboard? 

4. Is there any infonnation that would be useful to include in tlie interface that isn’t there? 

5. Are there any attributes that are not clear in tlieir meaning? 

6. What attributes did you use most in searches? 

7. Did you feel that you were able to find what you were looking for using the interface? 

8. How satisfied are you that tasks can be completed with the minimal number of steps? 

9. How could the interface be unproved? 

10. Do you think you would use this tool, once the database was populated, in your everyday work? Does it support the way 
you nonnally work? 

1 1 . What did you like most about the data entry interface? 

12. Was there anything really annoying about using it? 

13. Were there any parts of the data entry interface where it wasn’t clear what infonnation you should enter? 

14. How was using this interface different from the usual procedure for recording this type of infonnation? Do you think, in 
general, that this would save you tune or not? 

15. How could the interface be unproved? 

16. Do the different parts of the system have consistent appearance and work in similar ways? 

17. How satisfied are you with the system appearance in tenns of color, layout, and graphics usage? 

Figure 5. Evaluation Interview Questions 

Interface evaluation sessions were held with five different Q-Labs consultants from three different offices 
in May and June of 1999. In each session, the subject was given a short hands-on training, then given a set 
of exercises that represented common Q-Labs work scenarios. The exercises were taken from the set of use 
cases we had collected as part of the initial requirements gathering activity for EMS. The subjects were 
asked to choose some of the exercises and then to use the Q-Labs EMS prototype to gain information 




relevant to the scenario described in each exercise. They were also asked to verbalize their thoughts and 
motivations while working through the exercises. This technique, called a “think aloud” protocol [8], is 
often used in usability studies (and occasionally in other software engineering studies [14]) to capture a 
subject’s immediate impressions, thought processes, and motivations while performing a task. The subjects 
could and did ask questions of the researcher conducting the session. After several exercises had been 
completed, a short interview was conducted, using the questions presented above as an interview guide. 

All the sessions were audiotaped and observed by at least one researcher. Each session lasted about 1.5 to 2 
hours. Although the tapes were not transcribed verbatim, they were used to write very detailed notes after 
the fact. 

The notes written from the tapes served as the major data source for the analysis part of the study. The 
analysis method used was the constant comparison method [7,10]. This method begins with coding the 
field notes by attaching codes, or labels, to pieces of text that are relevant to a particular theme or idea that 
is of interest in the study. Then passages of text are grouped into patterns according to the codes and 
subcodes they’ve been assigned. These groupings are examined for underlying themes and explanations of 
phenomena. The next step is the writing of a field memo that articulates a proposition (a preliminary 
hypothesis to be considered) or an observation synthesized from the coded data. In this case, the field 
memo written as part of this process became the results of the study, which are reported in the next section. 

5.2 Results 

The subjects generally liked the basic elements of the search and retrieval interface. In particular, they 
seemed to have no trouble mastering the search mechanism and liked how it was easy to negotiate the 
interface and see the distribution of packages among different attribute values. They also liked the 
immediate feedback in the graph part of the interface in response to changes made with the search 
mechanisms. Subjects were also able to glean useful infonnation from the interface even when they 
couldn’t find exactly what they were looking for. For example, one subject found a document that was not 
exactly what she wanted, but she saw the primary author’s name and decided that would be a good contact, 
and so she felt she had found useful information. 

The learning curve on the search and retrieval interface was fairly short. By the second or third exercise 
tried, all of the subjects were conducting their searches very rapidly and confidently. For some subjects, it 
was even quicker. Subjects generally narrowed their searches down to about 2-4 “hits” before looking at 
individual packages. This was seen as a “reasonable” number of packages to look tlu-ough. 

Several major annoyances surfaced during the evaluation. One was the use of slider bars. Several subjects 
had trouble figuring out the mechanics of using and interpreting them. Several subjects suggested using 
some form of checkboxes instead of the slider bars. Another amioyance had to do with the relationship 
between the two perspectives and the lack of linkage between them. After finding some relevant project in 
the projects perspective, subjects had to then start up the document perspective and start a search from 
scratch in order to find documents related to the project. A related problem was the confusion caused by 
some attributes and attribute values existing in one perspective but not the other. 

As for the data entry interface, the data being collected was seen to be appropriate, but otherwise it left a lot 
to be desired. Subjects in general found the data entry interface unusable because they needed more 
guidance as to what attribute values to enter in the fields. Almost all of the subjects suggested pull-down 
menus or automatic fill-ins to decrease the amount of typing and increase consistency, hi general, the 
subjects saw this interface as just a skeleton of what was needed. 

All of this was valuable feedback that has been used in our plans for further development of the Q-Labs 
EMS. Although we knew that the interface we were evaluating was not ideal, we had not anticipated some 
of the specific problems that our subjects reported. For example, we had not considered the slider bar 
mechanism to be a problem, but our subjects definitely did. Also, although we knew the data entry 
interface needed some improvements (many of the suggestions from the subjects were already in our 
development plans), we had not considered it as completely unusable as our subjects did. On the other 
hand, the study validated some of our basic choices in the interface design, e.g. the VQI and the use of 
attributes and perspectives. Thus we can, with confidence, continue improvement of the interface without 
changing the underlying stmcture. 



There were also some lessons learned about how the interface evaluation was conducted. Some problems 
came up related to the limited scope of the repository. Subjects were sometimes frustrated when there was 
nothing to be found for their search criteria. Subjects were also bothered by inconsistencies in the sample 
data. In particular, one subject found that there was a document in the documents perspective, that had a 
project name associated with it, but that project was not to be found in the projects perspective. 

The interface evaluation, in general, proved to be a valuable and timely tool for getting feedback from the 
eventual users of the Q-Labs EMS. The effort involved was relatively small, although finding and 
scheduling subjects was difficult and caused some delays. Although much remains to be done before an 
operational system is delivered, the evaluation assured us that the Q-Labs EMS will eventually be 
acceptable to its intended users. In addition, the evaluation provided an opportunity to disseminate the 
aims of our project, and our work thus far, throughout Q-Labs. 

6 Conclusions 

We have described an ongoing project involving the Experimental Software Engineering Group (ESEG) at 
the University of Maryland and Q-Labs, Inc. that aims to provide a system (with both organizational and 
automated elements) to support software engineering experience capmre and reuse. The current design of 
this system, called the Q-Labs EMS, is outlined, in particular its architecture and its user interface. 
Currently, an interface prototype exists and has been evaluated. This evaluation is described in detail. The 
results of the evaluation have assured us not only that the Q-Labs EMS will eventually be successfully 
deployed throughout Q-Labs, but will also serve as a testbed for our further investigation of software 
experience capture and reuse. However, much needs to be done before a working version of this system is 
in place. The prototype that has been evaluated encompassed only some of the automated features of the 
system. Much of the technical work remains, as well as the organizational part of the system. The latter 
includes designing, implementing, and evaluating new organizational procedures and deployment strategies 
to ensure the acceptance of EMS at Q-Labs. 
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appropriate infrastructure for an Experience Factory 
Q-Labs EMS is the first prototype EMS 











Taxonomy ^ . 

Title. Document 

Author: string 

Tech area: list of string<?l, EF, Inspection, 





US 

m 

Sh 

o 

U 




A 

CD 

o 


CD 

■o 


A 


CD 


CD 


O E 


CO 
CO CO 


CO O 
d> CO 


O CO 
p CD 


a. 


03 


CO 




CO ~ 
>V TO 


2 V 


bJO 


m e 


ss ® 

•S o 


0> 


1 o ^ 
CD 

I- O 

cf ft 

g> 

CO S 

O ft 


* 

6JD <U 
cs ^ 

a ^ 


.H « 

Q 


















latio 





Evaluation was relatively low-cost and 
high-benefit 



The first results from a much-awaited line 
of research on developing infrastructure for 


in 

<D 

• f-H 

O 

-I— > 

O 

Ph 

QJ 

O 

C 

<D 

• I 

(D 

Ph 

X 

w 




4 h 

O 

<D 


O 

Vh 

Ph 

<D 

O 

tf? 

pH 


O 

cj 

> 

(D 

& 
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Integrated Financial Management System 

- Contractor provided COTS >MLOC product 
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System engineers often have little software expertise 
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Technologies have advanced in the small 

❖ When we attempt to solve large problems, new approaches easily hit limits 

❖ Today’s problems are no more complex than problems 20 years ago 
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Supported by Process* 




SEAS Center data for over 90 projects 1988 - 1998 
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Challenge 

Research and development to define new processes 
and tools 




Industry Perspective: S/W Future 




(0 

0 ) 

D) 

O 


2 -B 

TO ro 

2 -I 

Q- 5 


f- 

O -2 

0 I 

O) 03 

1 8 


O 

O 

(/) 

CO 

0 ) 

0 


m 


A A 


D) 

O 

O 

0 


CO 

CD 

0 


0 fc 

E B 

I - 

§ ™ 
o o 


O) TO ^’■F 


(D 


0 ) T 

.2 o 


CO 


— o 0) CD 

5 > C/3 CC 

AAA 





■ - ■ V 



Past Hits and Misses 



■ - ■ V 



Software Research Today 



■ - ■ V 



Jr Might Not Get Easier 



■ - ■ V 



Jr Might Not Get Easier, II 



■ - ■ V 



Session 7: Inspections 


Edward Weller, Bull HN Information Systems 


A1 Florence, MITRE 


Amarjit Singh Marjara, Cap Gemini AS 


SEW Proceedings 


SEL-99-002 




th 

24 Annual Software Engineering Workshop 
Dec 1 -2, Goddard Space Flight Center 


Quantitative Methods Do Work 


Edward F. Weller 
Fellow, Software Process 
Bull HN Information Systems 
13430 N. Black Canyon 
Phoenix, AZ 85029 


Tele: (602) 862-4563 
Fax: (602) 862-4288 

e-Mail: e.weller@bull.com 


© Bull, 1999 


1 



Quantitative Methods Do Work 

Quantitative methods, including statistical process control, can be effective tools for 
predicting and evaluating product quality during development and test. The data analysis 
and conclusions from applications of quantitative methods, including statistical process 
control, to two projects that were major components of a software release to Bull HN 
Information System’s GCOS 8 Operating System, will show how these techniques were 
effective and useful. During development and test, we used the release quality predictions 
as one of the project metrics. We found that analysis of inspection and test results using 
SPC techniques helped us predict (perhaps understand is a better word) the release 
quality and the development processes controlling the release quality. We were able to 
answer the question “Can we ship this product?” with data rather than guesswork. 

Inspections have been used in GCOS 8 development since 1990\ The process is stable 
and provides data used by project management^. Our goal in the current release was to 
use defect density during development and test as input to predicting the post ship 
product quality with reasonable assurance. We are aware of the problems with using 
defects to predict failures (Adams^, Fenton and Pfleeger'*), but in the absence of other 
data or usage based testing results, this was what we had to evaluate release quality. 

Prediction: Stable versus Unstable Processes 

Predicting the future behavior of a process cannot be done unless the process itself is 
stable. This is a reason for using statistical methods. A variety of techniques can be used 
to evaluate the underlying process stability. Control charts can be used to calculate upper 
and lower control limits (UCL and LCL). Processes that stay within limits and do not 
exhibit other indications of lack of control can be assumed to be “controlled processes”. 
This implies several things about the process: 

• Past performance can be used to predict future performance within the control 
limits 

• Process capability relative to a customer specification can be determined 

Estimating Defect Injection 

Previous inspection process and product data were evaluated and estimates were made 
for: 

• Defect injection rates 

• Defect removal rates (inspection effectiveness^) 

• Defects entering unit test 

Prior inspection, test, and post ship defect history was used to estimate the defect 
injection rate. This is a potential area for applying SPC. With enough data, you can 
establish ranges for defect injection rates, accuracy of size estimates, and inspection 


’ Inspection effectiveness is the percentage of major defects removed in each inspection phase, or total 
defects removed in inspections, divided by the total number of defects in the product at the time of the 
inspection. Since the total number of defects discovered is never known until a product is retired from use, 
effectiveness is always an estimate, but one that changes very slightly after a product is shipped, assuming 
reasonable post ship quality levels. 
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removal effectiveness. For these projects we did not have sufficient data samples to do 
this, so we based our estimates on specific product and project history. 


The size, defect injection rates, and prior inspection data were used to develop a defect 
injection and removal profile. 



Figure 1 - Initial Defect Injection and Removal Estimate 


Inspection Data Analysis 

On these two projects the first opportunity to apply SPC was during code inspections. On 
one project, the work was divided into two parts; the creation of a product feature, and 
the revision of existing code. A histogram of preparation rates in lines of code per hour is 
shown in Figure 2. 
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Figure 2 - Preparation Rate Histogram 

Outliers were examined and eliminated when special causes of variation were discovered. 
I also compared the preparation rate distribution to the inspection rate, shovm in Figure 3. 
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Figure 3 - Inspection Rates for 30 inspections 

This bimodal distribution was caused by two types of code, “new”, and changed. Figure 
4 shows these 2 classes separately. 


New 


Changed 




Figure 4 - New vs Changed Inspection Rates 

I expect new code inspections to be “better behaved’ than changes to existing code. 
(Many inspections of modified code are small in size, causing preparation and inspection 
rates to have a larger variance. Knowledge of the changed (old) code inspected may also 
have a wider variance than the new code). The separate views in Figure 4 are typical of 
much of the inspection data I investigate. The new code approximates a normal 
distribution as closely as you may see with real data. 

A control chart for this data with special causes removed showed a well controlled 
process: 



Figure 5 - Preparation Rate with Outliers Removed 
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def/SLOC --^DefUCL 



Figure 6 - Defect Density Control Chart^^ 

What have we learned about this product and its contribution to the system release? With 
two exceptions, the inspection process seems to be well controlled. The outliers were 
investigated (as were other inspection meetings) to understand how well the inspection 
process was performed. In this case, the outliers were for inspections of changed code, so 
these outliers were evaluated as caused an assignable cause, and the defect data was 
within control limits. 


Once we were reasonably confident the inspection process was controlled, we developed 
defect depletion curves for the projects and the system release. 



Phase Phase In] Est 


3 Phase Phase Expected 
Removal 

3 Phase Phase Actual 
Removal 

-Cumul Actual Removal 


» -X- » Cumul Bcpected Removal 
"""o Cumul Inj Est 


Figure 7 - Defect Depletion at End of Code Inspection 


" The lower control limits cannot be less than zero, although for convenience the LCL was plotted on this 
chart as calculated. Once you verify the data is above the LCL, for possible values of LCL, it is probably 
better to delete this line from the chart. 
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Figure 8 - Replotted Defect Depletion with New Size Estimate 


Unit and Integration Test 

Both projects kept accurate records of defects found during Unit and Integration test. 
Both projects developed test objective matrices and developed test plans and 
specifications, so we had some expectation to remove defects more effectively than the 
30-50% “norm” often quoted in the industry. 



Figure 9 - Project One Defect Depletion 


Figure 9 shows project one, as it was about to enter System Test (this chart is used in our 
monthly project review as well as the weekly team meetings). It shows the re-estimate for 
the number of defects injected. Note the defect removal in Unit Test was higher than 
estimated and that subsequently in the two phases of Integration Test a small number of 
defects were removed. Without accurate defect removal data from Unit Test these low 
numbers would be of more concern with respect to product quality. The Current Timeline 
is indicated to show the furthest stage where the project defect removal is happening. 


This analysis continued through System Test as shown in Figure 10. 
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I Build 1 Actual 
3 Other Actual 


H Build 3 Actual 
■ Est. Defects Remaining 


Maint. Actual 

Est. Removal Rate (50%) 


lnt-2 Test 


System Test 


FST 



FISCAL WEEKS 


Figure 10 - System Test 

Conclusions 

You should ask two questions about any metric or analysis technique: 

• Is it useful! Does it provide information that helps make decisions? 

• Is it useable! Can we reasonably collect the data and do the analysis? 

We found that the knowledge we gained about product quality and the processes used to 
develop these products gave a definite “Yes” to both these questions. 


^ E.Weller, “Lessons Learned from 3 Years of Inspection Data”, IEEE Software, Sept 1993 
^ E.Weller, “ Using Metrics to Manage Software Projects”, IEEE Computer, Sept 1994 
^ E. Adams, “Optimizing Preventive Service of Software Products”, IBM Journal of Research & 
Development, Jan 1984 

N. Fenton and S. Pfleeger, Software Metrics, PWS Publishing Company, 1997, pp 344-348 
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Quantitative Methods? 




If using quantitative methods doesn *t meet a 



How Well Are We Doing Inspections? 
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How Well Are We Doing Inspections? 



Two special causes of variation removed 






Expected Removal Phase Actual Removal 



BuU 






How Well Are We Doing in Test? 



Hull 11\ InloniutiiJii Sy?'K’nj'» Iik\ 





SNSI lAJOJj S103d3Q 


How do you know if this detection rate is good or bad? 





Evaluating the Product 



BuU 
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Abstract 

The Software Engineering Institute’s (SEI) Software (SW) Capability Maturity Model (CMM) 
Level 4 Quantitative Analysis leads into SW-CMM Level 5 activities. Level 4 Software Quality 
Management (SQM) Key Process Area (KPA) analysis, which focuses on product quality, feeds 
the activities required to comply with Defect Prevention (DP) at Level 5.[1] Quantitative 
Process Management (QPM) at Level 4 focuses on the process which leads to Technology 
Change Management (TCM) and Process Change Management (PCM) at Level 5. At Level 3, 
metrics are collected, analyzed and used to status development and to make corrections to 
development efforts, as necessary. At Level 4, metrics are quantitatively analyzed to control 
process performance of the project and to develop a quantitative understanding of the quality of 
products to achieve specific quality goals. At Level 5, the Level 4 analysis is used, as 
appropriate, to investigate and incorporate new processes and technologies and for the 
prevention of defects. 

This paper presents the application of Statistical Process Control (SPC) in accomplishing the 
intent of SQM and QPM and applying the results to DP. Real project results are used to 
demonstrate the use of SPC as applied in a software setting. Presented are the processes that the 
author formulated, launched and conducted on a large software development effort. The 
organization had obtained SW-CMM Level 3 compliance and was pursuing Level 4 and Level 5. 
All Level 4 and Level 5 processes were installed and conducted on the project over a period of 
time. The main quantitative tool used was Statistical Process Control utilizing control charts. 
The project analyzed life cycle metrics collected during development for requirements, design, 
coding, integration, and during testing. Defects were collected during these life cycle phases and 
were quantitatively analyzed using statistical methods. The intent was to use this analysis to 
support the project in developing and delivering high quality products and at the same time using 
the information to make improvements, as required, to the development process. 
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Introduction 


This introduction presents an overview of SPC and why it is applied to software. It presents a 
review of the Level 4 KPAs and Defect Prevention at Level 5. Next, Level 4 quality goals and 
plans to meet those goals are described followed by some real project examples in applying SPC 
to real project data. 

Control Charts 


Figure 1 shows a control chart and demonstrates how control charts are used for this analysis. [3] 
According to the normal distribution, 99% of all normal random values lie within +/-3 standard 
deviations from the norm, 3-sigma. [3] If a process is mature and under statistical process 
control, all events should lie within the upper and lower control limits. If an event falls out of 
the control limits the process is said to out of statistical process control and the reason for this 
anomaly needs to be investigated for cause and the process brought back under control. 


Measurements 



Time ► 


Figure 1. Control Chart 


Control charts are used because they separate signal from noise, so when anomalies occur they 
can be recognized. They identify undesirable trends and point out fixable problems and potential 
process improvements. Control charts show the capability of the process, so achievable goals 
can be set. They provide evidence of process stability, which justifies predicting process 
performance. 


Control charts use two types of data: variables data and attributes data. Variables data are 
usually measurements of continuous phenomena. Examples of variables data in software 
settings are elapsed time, effort expanded, and memory/CPU utilization. Attributes data are 
usually measurements of discrete phenomena such as number of defects, number of source 
statements, and number of people. Most measurements in software used for SPC are attributes 
data. It is important to use the correct data on a particular type of control chart. [3] 

Quantitative Analysis Flow 


Figure 2 shows the Level 4 Quantitative Analysis process flow for Software Quality 
Management and for Quantitative Process Management.[l] 


2 





Project Management 
Analysis Staff 


PAT - Process Action Team 

Figure 2. SQM and QPM Flow 

When conducting quantitative analysis on project data the results can be used for both Software 
Quality Management and for Quantitative Process Management. If the data analyzed are defects 
detected, the intent is to reduce the defects during the activities that detected the defects 
throughout development, thus satisfying SQM. When out of statistical control conditions occur, 
the reason for the anomaly is investigated and the process brought back under control which 
satisfies QPM. 

Defect Prevention Flow 


Figure 3 shows the Level 5 Defect Prevention process flow.[l] 
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Defects can occur during any life cycle activity against any and all entities. How often do we see 
requirements that are without problems or schedules that are adequate or management that is 
sound? Defect Prevention activities are conducted on any defects that warrant prevention. 
Defect prevention techniques can be applied to a variety of items: 

• Project Plans 

• Project Schedules 

• Standards 

• Processes 

• Procedures 

• Project Resources 

• Requirements 

• Documentation 

• Quality Goals 

• Design 

• Code 

• Interfaces 

• Test Plans 

• Test Procedures 

• Technologies 

• Training 

• Management 

• Engineering 

Level 4 Feeds Level 5 

Figure 4 shows how data collection, analysis and management from Level 4 activities lead to the 
activities at Level 5 of Defect Prevention, Technology Change Management, and Process 
Change Management KPAs.[5] 


Level 4 Level 5 



Figure 4. Level 4 and Level 5 Paths of Influence 


Quantitative Process Management, which focuses on the process, leads to making process and 
technology improvements while Software Quality Management, which focuses on quality, leads 
to preventing defects. 
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Level 4 Goals and Plans 


The CMM requires that Level 4 goals, and plans to meet those goals, be based on the processes 
implemented, that is, on the processes’ proven ability to perform.[l] Goals and plans must also 
reflect contract requirements. As the project’s process capabilities and/or contract requirements 
change, the goals and plans may need to be adjusted. 

The project that this paper is based on had the following key requirements: 

• Timing - subject search response in less than 2.8 seconds 98% of time 

• Availability - 99.86% 7 days, 24 hours (7/24) 

These are driving requirements that constrain hardware and software architecture and design. To 
satisfy these requirements, the system needs to be highly reliable and with sufficiently fast 
hardware. 

Goals 

The planned quality goals are: 

• Deliver a near defect tree system 

• Meet all critical computer performance goals 


Plans 

The plans to meet these goals are: 

• Defect detection and removal during 

- Requirements peer reviews 

- Design peer reviews 

- Code peer reviews 

- Unit tests 

- Thread tests 

- Integration and test 

- Formal tests 

• Monitoring of critical computer resources 

- General purpose million instructions per second (MIPS) 

- Disc storage read inputs/outputs per second (lOPS) per volume 

- Write lOPS per volume 

- Operational availability 

- Peak response time 

- Server loading 


Quantitative Analysis Examples 

The following are real examples from the project discussed above applying SPC to real data over 
a period of two years. 
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Example 1 


Table 1 shows raw data collected at code peer reviews over a period of months. Each sample 
represents a series of peer reviews over several weeks. The “units” are units of code and the 
“SLOC” is the number of source lines of code (SLOC) review for that sample. The “defects” are 
the number of defects detected for that sample normalized to 1000 lines of code in the last 
column. 

Table 1. Code Peer Review Defects 


Sample 

Units 

SLOC 

Defects 

Defects/KSLOC 

1. Mar 1998 

6 

515 

15 

29.12 

2. Apr 1998 

10 

614 

16 

26.06 

3. Apr 1998 

7 

573 

7 

12.22 

4. Apr 1998 

7 

305 

7 

22.95 

5. Apr 1998 

4 

350 

21 

60 

6. Apr 1998 

3 

205 

2 

9.76 

7. Apr 1998 

8 

701 

11 

15.69 

8. May 1998 

3 

319 

3 

9.40 

Totals 

76 

3582 

72 



The formulas for constracting the control chart follow. [3] The control chart used is a U-chart. 

• Defects/KSLOC = Number of Defects * lOOO/SLOC reviewed per sample (calculated 
for each sample). These are plotted as Plot. 

• CL = Total Number of Defects/Total number of KSLOC reviewed * 1000 

• a(l) = SLOC reviewed/ 1000 (calculated for each sample) 

• UCL = CL+3(SQRT(CL/a(l)) (calculated for each sample) 

• LCL = CL-3(SQRT(CL/a(l)) (calculated for each sample) 

The defects per 1000 lines of code is the plot on the chart. The center line (CL) is an average 
while a(l) is a variable calculated for each sample. The upper control limit (UCL) and the lower 
control limit (LCL) are also calculated for each sample. The calculations are shown in Table 2. 
Whenever the LCL is negative, it is set to zero. 

Table 2. Calculations for Code Peer Review Defects 


Sample 

Plot 

CL 

UCL 

LCL 

a(l) 

1. Mar 1998 

29.13 

20.1 

38.84 

1.36 

0.515 

2. Apr 1998 

26.06 

20.1 

37.27 

2.96 

0.614 

3. Apr 1998 

12.22 

20.1 

37.87 

2.333 

0.573 

4. Apr 1998 

22.96 

20.0 

44.45 

0 

0.305 

5. Apr 1998 

60 

20.1 

42.84 

0 

0.35 

6. Apr 1998 

9.76 

20.1 

49.80 

0 

0.205 

7. Apr 1998 

15.71 

20.1 

36.16 

4.04 

0.701 

8. May 1998 

9.40 

20.1 

43.91 

0 

0.319 


6 





The control chart is shown in Figure 5. 



♦ 

Plot 


CL 

— ^ — 

UCL 


LCL 


Figure 5. Control Chart for Code Peer Review Defects 

An anomaly occurred in the fifth sample. Causal analysis revealed that data for that sample were 
for database code, all others were applications code. Control charts require similar data for 
similar processes, i.e., apples to apples analogy. The database sample was removed and the data 
charted again as shown in Figure 6. 



Figure 6. Control Chart without Database Defects 

The process in now under statistical process control. The root cause is that data gathered from 
dissimilar activities carmot be used on the same statistical process on control charts. Data from 
design carmot be combined with data from coding. The process for database design and code is 
different from that used for applications design and code as are the teams and methodologies. 
The defect prevention is against the process of collecting data for SPC control charts. 
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Example 2 

Table 3 shows raw data collected during code peer reviews. 

Table 3. Code Peer Review Defects 



Units 

SLOC 

Defects 

Defects/KSLOC 

l.Feb 1997 

17 

1705 

62 

36.36 

2. Mar 1997 

18 

1798 

66 

36.70 

3. Mar 1997 

15 

1476 

96 

65.04 

4. Mar 1997 

19 

1925 

57 

29.61 

5. Mar 1997 

17 

1687 

78 

46.24 

6. Apr 1997 

18 

1843 

66 

35.81 

Totals 

104 

10434 

425 



The calculations are shown in Table 4. 

Table 4. Calculations for Code Peer Review Defects 



Plot 

CL 

UCL 

LCL 

A(l) 

l.Feb 1997 

36.4 

40.73 

55.4 

26.09 

1.7 

2. Mar 1997 

36.7 

40.73 

55.01 

26.45 

1.8 

3. Mar 1997 




24.97 


4. Mar 1997 




26.93 


5. Mar 1997 

gHH 

40.73 

■bseb 

25.99 

1.7 


35.8 

40.73 

54.84 

26.63 

1.8 


The control chart is shown in Figure 7. 



The process is out of statistical process control in the third event. Causal analysis revealed that 
this was caused when the project introduced coding standards and many coding violations were 
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injected. The root cause is lack of knowledge of the coding standards and the defect prevention 
is to provide training whenever a new process or technology is introduced. 


Example 3 

During integration thread tests, the defects were categorized against the test plan, test data, code 
logic, interfaces, standards, design, and requirements. Defects against these attributes are shown 
in Table 5. 


Table 5. Thread Test’s Defects 






















































Test data would not be expected to have the majority of defect. The root cause was that the test 
data in the test procedures had not been peer reviewed. The defect prevention is to peer review 
the test procedures and the test data. 

Example 4 

During preliminary design and prior to acquiring hardware, a simulated performance model was 
used to monitor critical computer resources. Figure 9 shows some results of monitoring general 
purpose MIPS. 



Mar-94 Sep-94 Apr-95 Oct-95 May-96 Dec-96 Jun-97 

Date 


Figure 9. General Purpose MIPS 

Around November 1995 many new requirements were added to the system and the architecture’s 
MIPS threshold was threatened because of increased computations. In May 1996 additional 
MIPS were added to the hardware design and the problem was corrected. 

Conclusion 

Statistical process control and the use of control charts can be effectively used in a software 
setting. SPC can identify undesirable trends and point out fixable problems and potential process 
improvements. Control charts can show the capability of the process, so achievable goals can be 
set. They can provide evidence of process stability, which can justify predicting process 
performance. 
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Deliver a near defect free system 

Meet all Critical Computer Performance Goals 
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Statistical Process Control Charts 




According to the Normal Distribution, 99% of all normal random values 
lie within +/-3 standard deviations from the norm, that is, 3 sigma 





Statistical Process Control Charts 
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Provide evidence of process stability, which justifies 
predicting process performance 
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Calculating the limits 
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Code Peer Reviews Control Chart 



Analysis revealed that this was caused when the project introduced coding 
standards and many coding violations were introduced 







Example 

Critical Computer Resources 



The customer introduced many new requirements around Nov/Dec 1995 

The model revealed that the MIPS threshold was threatened with increased computations 

More MIPS were added to the architecture in May 1996 
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When LCL is negative it is set to zero 
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Revealed that data were for database code and applications code 
Control charts require similar data for similar processes 
Apples to apples analogy 



Defect Prevention Example tCont 



Process is now under statistical process control 
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The defect prevention is against the process of collecting 
data for SPC control charts 
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Totals 6 102 


Bar Chart for Thread Tests 



Test data would not be expected to have the majority of defects 
The root cause is that test procedures had not been peer reviewed 
The defect prevention is to peer review test procedures 
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• HMI is Human Machine Interface, Others are Applications 

Again, dissimilar activities cannot be used on the same 
statistical process on control charts 
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Abstract 

Inspections and testing represent core techniques to ensure reliable software. Inspections also seem to 
have a positive effect on predictability, total costs and delivery time. 

This paper presents a case study of inspections and testing, done at the Ericsson development 
department outside Oslo in Norway. This department develops and maintains customer-defined 
services aroimd AXE phone switches, i.e. the ftmctionality around the “star”" and “square” buttons on 
house telephones. 

AXE development at Ericsson world-wide uses a simple, local experience database to record 
inspections and testing data. Two MSc students from NTNU have been given access to such historical 
data in 1997 [Marjara97] and 1998 [Skaatevik99]. The results from these two diploma theses 
constitute the basis for this paper. 

The paper will study questions such as: 

- The effectiveness and cost-effectiveness of inspections, 

- The cost-effectiveness and defect profile of inspection meetings vs. individual reading, 

- The relation between complexity/modification-rate and defect density, 

- Whether the defect density for modules can be predicted from inspections for later phases and 
deliveries. 

The paper is organized as follows: Section 1 summarizes some relevant parts of the state of the art, 
especially of inspections. Section 2 first describes the Ericsson context, and Section 3 describes 
questions and hypotheses for the study. Section 4 describes the organization of the study, and 
Section 5 presents and discusses the results. Section 6 sums up the paper and recommends some 
future work. 
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Preface 

The paper will present results from two MSc theses at NTNU, that have analyzed historical defect 
data at Ericsson in Oslo, Norway — related to their AXE switches. Ericsson has practised Gilb 
inspections for many years, and collects defect data from inspections and testing in a small database. 

These studies revealed that inspections indeed are the most cost-effective verification technique. 
Inspections tend to catch 2/3 of the defects before testing, by spending 10% of the development effort 
and thereby saving about 20% of the effort (by earlier defect correction, a "win-win"). Inspection 
meetings were also cost-effective over most testing techniques, so they should not be omitted. 
Inspection meetings also found the same type of errors (Major, Super Major) as individual 
inspections. 

We also found that there is a correlation between module complexity, modification rate, and the 
defect density found during field-use, but not during inspections and test. Due to missing data, we 
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could not find out whether the defect density of modules repeated itself across inspection/test phases 
and over several deliveries, i.e. we could not predict "defect-prone" modules. Defect classification 
was also unsatisfactory, and prevented analysis of many interesting hypotheses. 


1. State of the art 

Quality in terms of reliability is of crucial importance for most software systems. 

Common remedies are sound methods for system architecture and implementation, high-level 
languages, formal methods and analysis, and inspection and testing techniques. Especially the latter 
two have been extensively described in the literature, and vast empirical materials have been 
collected, analyzed and published. This paper only refers to general test methods, so we will not 
comment on these here. 

Inspections were systematized by Fagan [Fagan76] [Fagan86] and represent one of the most important 
quality assurance techniques. Inspections prescribe a simple and well-defined process, involving 
group work, and have a well-defined metrics. They normally produce a high success rate, i.e. by 
spending 10% of the development effort, we diagnose 2/3 of the defects before testing, and save 20% 
of the total effort — a win-win: so “quality is free”. Inspections can be applied on most documents, 
even requirements [Basili96]. They also promote team learning, and provide a general assessment of 
reviewed documents. 

Of current research topics are: 

■ The role of the final inspection meeting (emphasized by Tom Gilb [Gilb93], see also [Votta93]. 

■ When to stop inspections? 

■ WTien to stop testing, cf [Adanis84]? 

■ The effect of root-cause-analysis on defects. 

■ The role of inspection vs. testing in finding defects, e.g. their relative effectiveness and cost- 
effectiveness. 

■ The relationship between general document properties and defects. 

■ Defect densities of individual modules through phases and deliveries. 

Our research questions and hypotheses deal with the three latter. 


2. The company context 

Ericsson employs about 100,000 people world-wide, whereof 20,000 in development. They have 
company-wide and standardized processes for most kind of software development, with adaptations 
for the kind of work being done. Ericsson has adopted a classical waterfall model, with so-called 
"tollgates" at critical decision points. In all this, verification techniques like inspections and testing are 
crucial. Inspection is done for every life-cycle document, although we will mostly look at design and 
code artifacts. Testing consists of unit test, fimction test and system test, where the two latter may be 
done at some integration site different from the development site (e.g. Stockholm). 
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We will only study design inspections (in-groups), simplified code reviews (by individuals) and partly 
testing in this paper. 

The inspection process at Ericsson is based on techniques originally developed by Michael Fagan 
[Fagan76] at IBM and refined by Tom Gilb [Gilb93], The process is tailor-made by the local 
development department. In addition there is a simplified code review done by individual developers 
(data from code review and unit test are sometimes merged into a “desk check"). Thus full inspections 
are only done upon design documents in our studies. Data from inspections/reviews and testing are 
collected in a simple, proprietary database and used for local tuning of the process. Defects are 
classified in Major, SuperMajor and Questions (the latter is omitted here) - thus no deep 
categorization. 

We have studied software development at the Ericsson site outside Oslo. It just passed CMM level 2 
certification in Oct. 1998, and aims for level 3 in year 2000. The Oslo development site has about 400 
developers, mostly working on software. The actual department has about 50 developers, and works 
mostly on the AXE- 10 digital software switch, which contains many subsystems. Each subsystem 
may contain a number of modules. The development technology is SDL design language (SDT tool 
from Telelogic) and their proprietary FLEX language from the late 1970s (own compilers and 
debuggers). 

Figure 1. Basic inspection process at Ericsson for design artifacts (documents). 


Participants 


Duration 


Moderator 

Whole team 

Inspectors 

(individually) 

Whole team 

Interested 

parties 

Interested 

parties 

Author 

Moderator 



10-15 minutes 


maximum 2 hours 
(the specified fixed 
rates must be followed) 

maximum 2 hours 
(the inspection rates 
must be followed) 

optional 


optional 


The first level inspection process 


Special inspection groups are formed, called product committees (PC), to take care of all impacts on 
one subsystem. In this paper, we will only look at subsystem-internal inspections, not across 
subsystems. The inspection process is indicated in figure 1 above, and follows Fagan/Gilb 
inspections wrt. overall set-up, duration etc. The number of inspectors per document is typically 3-4. 
Special check-lists are used for each document type. 
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The different types of documents are presented in the table 1 below. 
Table 1. Document types (18 such). 


Document type 

Application Information 

ADI 

Adaptation Direction 

AI 

Application Information 

BD 

Block Description 

BDFC 

Block Description Flow Chart 

COD 

Command Description 

FD 

Function Description 

FDFC 

Function Description Flow Chart 

FF 

Function Framework 

FS 

Function Specification 

FTI 

Function Test Instraction 

FTS 

Function Test Specification 

IP 

Implementation Proposal 

OPI 

Operational Instraction 

POD 

Printout Description 

PRI 

Product Revision Information 

SD 

Signal Description 

SPL 

Source Parameter List 

SPI 

Source Program Information 


Each of these document types have specific, recommended inspection rates (Skatevik99). 


3. Questions and hypotheses 

3.1 One Observation 

Ol: How (cost-)effective are inspections and testing? 

3.2 Three Questions 

Ql: Are inspections performed at the recommended inspection rates? 

Q2: How cost-efficient are the inspection meetings? 

Q3: Are the same kind of defects found in initial inspection reading and following inspection 
meetings? 

3.3 Three Hypotheses 

For each question we present one null hypothesis. Ho, which is the one that will actually be tested, and 
an alternative hypothesis, H^, which may be considered valid if the null hypothesis is rejected. For the 
statistical tests presented in this paper, a significance level (/i-level) of 0.10 is assumed. 


5 




The three alternative hypotheses are: 


HI: Is there a significant, positive correlation between defects found during field-use and document 
complexity? 

H2: Is there a significant, positive correlation between defects found during inspection/test and 
document complexity? 

H3: Is there a significant correlation between defect rates across phases and deliveries for individual 
documents/modules? (i.e. try to track "defect-prone" modules)? 


4. Organization of the study 

We have performed two studies where we have collected and analyzed historical data from software 
department at Ericsson in Oslo. Torbjom Frotveit, our middleman at Ericsson, has all the time 
furnished us with the requested data. 

This paper presents results from these two studies of inspection and testing: 

♦ Study 1: This is the work done in a diploma thesis from 1997 [Majjara97]. Maijara investigated 
inspection and test data from Project A of 20,000 person-hours (14 person-years). Defect data in 
this work included inspection, desk check, function test, system test and partly field-use. 

♦ Study 2: This is the follow-up work done in the diploma thesis from 1998 [SkStevik99]. This 
thesis has data from 6 different projects (Project A-F), including the project Maijara used in Study 
1. It represents over 100,000 person-hours (70 person-years). The test data in this work include 
only data from inspection and desk check, since later testings were done by other Ericsson 
divisions. However, it was possible to split desk check in code review and unit test, and data from 
these to activities are presented. Data from field-use are not included, due to same reasons as for 
function- and system test. 


Threats to internal validity: 

We have used standard indicators from the literature on most properties (defect densities, inspection 
rates, effort consumption etc.), so all in all we are on agreed ground. However, wrt. Module 
complexity we are unsure, and further studies are needed. Whether the recorded defect data in the 
Ericsson database are trustworthy is hard to say. We certainly have discovered inconsistencies and 
missing data, but our confidence is pretty high. 


Threats to external validity: 

Since Ericsson has standard working processes world-wide, we can assume at least company-wide 
relevance. However, many of the findings are also in line with previous empirical studies, so we feel 
confident on general level. 
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5. The results and evaluation of these 


This chapter presents the results from the two studies described in the previous section (4), and tries to 
conclude the questions and hypotheses stated in section 3. 
Two definitions will be used throughout this section, effectiveness and cost-effectiveness: 

Effectiveness: the degree to which a certain technique manages to find defects, i.e. diagnosed defect 
rate (defects per “volume-unit”), regardless of cost. This is sometimes called efficacy. 

Cost-effectiveness: effort spent to find one defect. 


5.1 01 : How (cost-)effective are inspections and testing? 

Here we shall describe and compare the effectiveness and cost-effectiveness of inspections and testing 
at Ericsson in Oslo. The effort spent before invidual reading is proportionally distributed over 
inspection reading and inspection meetings. The inspection-phase effort spent after inspection 
meetings are similarly merged into “defect fixing” (se Figure 1). Table 2 is taken from Study 1 and 
shows the effectiveness of inspections and testing. All efforts are in person-hours, sometimes just 
called hours. 

Table 2. Efficiency: total defects found. Study 1. 


Activity 

Defects 

\#] 

[%] 

Inspection reading, design 

928 


Inspection meeting, design 

29 


Desk check (code review + luiit test) 

404 


Function test 

89 


System test 

17 

1.1 

Field-use 

35 

2.3 

Total 

1502 

100.0 


Table 2 shows that inspections are the most effective verification activity, finding almost 64% of total 
defects found in the project. Second best is the desk check that finds almost 27%. We also see that 3% 
of the defects found by inspections are found in the meetings. To analyze which of the verification 
activities that are most effective, the effort spent on the different activities was gathered. Table 3 
shows the effort (person-hours) spent on the six verification activities. 

Table 3. Effort and cost-efficiency on inspection and testing. Study L 


Activity 

Defects 

[#] 

Total effort 
on defect 
detection 

[h] 

Cost- 

effectiveness 
[h;m per 
defect] 

Total effort 
on defect 
fixing 
[h] 

Estimated saved 
effort by early 
defect removal 
(“magic formulae”) 
[h] 

Inspection reading, design 

928 

786.8 

00:51 

311.2 

8200 

Inspection meeting, design 

29 

375.7 

12:57 

Code review and unit test 

404 


03:07 

- 

- 

Function test 

89 

7000.0 

78:39 

- 

- 

System test 

17 

- 

- 

- 


Field-use 

35 

- 

- 

- 
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When combining effort and number of defects, inspections proved to be the most cost-effective. Not 
surprisingly, function test is the most expensive activity (note: we have no effort data om system test). 
It should be noted that only human labor is included for desk check (code review and unit test) and 
function test. The costs of computer hours or special test tools are not included. Neither is the human 
effort spent in designing the test cases. 

In Study 2 it was not possible to get defect data from function test, system test and field-use 
(representing 9.3% of the defects in Study 1). Instead the data made it possible to split up the desk 
check, which actually consist of code review and unit test (emulator test). Table 4 shows the results. 

Table 4. Efficiency: total defects found. Study 2. 


Activity 

Defects [#] 

[%1 

Inspection reading, design 

4478 

71.1 

Inspection reading, design 

392 

6.2 

Desk check, code 

832 

13.2 

Unit test, code 

598 

9.5 

Total 

6300 

100.0 


Again, the data show that inspections are highly effective, contributing to 77% of all the defects found 
in the projects. Desk check is second best, finding almost 13% of the defects in the projects. 
Compared to Study 1, there is an improvement in the inspection meeting, whose effectiveness has 
increased from 3% to 8% for defects found during inspections. 

Table 5 shows the effort (person-hours) of the different activities from Study 2. In this study, no data 
from Fimction test or later tests were available. 

Table 5. Effort and cost-efficiency on inspection and testing. Study 2. 


Activity 

Defects 

[#] 

Total effort 
on defect 
detection 

[h] 

Cost- 

effectiveness 
[h:m per 
defect] 

Total effort 
on defect 
fixing 

[h] 

Estimated saved 
effort, by early 
defect removal 
(“magic formulae”) 
[h] 

Inspection reading, design 

4478 

5563 

01:15 

11737 

41000 

Inspection meeting, design 

392 

3215 

08:12 

Desk check, code 

832 

2440 

02:56 


- 

Unit test, code 

598 

4388 

07:20 


- 


The inspection meeting itself is more cost-effective in Study 2 (8h: 12m in per defect) than in Study 1 
(12h:57min per defect). 

In Study 2 covering 100,000 person-hours, a total of 20,515 person-hours were spent on inspections 
(including 11,737 person-hours on defect fixing). It has been calculated that inspections did save 
41,000 person-hours, which would have been necessary to locate and correct defects otherwise found 
by later testing. That is, a net saving of 21% of the total project effort. 

Study 1 covered 20,000 person-hours where 1474 person-hours were spent on inspections (including 
31 1.2 person-hours on defect fixing). In this study it was calculated that Ericsson saved 8200 person- 
hours, or a net saving of 34% ! 
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5.2 Q1: Are inspections performed at the recommended inspection 

rates? 

Here we want to see if the recommended inspection rates were actually applied. The results are 
presented in table 6. Note, that all this applies to design documents, not source code. 

Table 6. Recommened rate versus actual total effort during inspections in Study 2. 


Type of effort 

Total inspection effort 
including defect fixing [h] 

Share [%] 

Actual effort. Study 1 

1474 

54% 

Recommended inspection rate. Study 1 

2723 

— 

Actual effort. Study 2 

20,515 

78,6% 

Recommended inspection rate. Study 2 

26,405 

— 


Thus in Study 2, inspections are performed too fast. Only 20,515 person-hours are actually spent on 
inspections including defect fixing - being 78.6% of the recommended expediture of 26,405 person- 
hours. The average number of defects per page is 0.43. 

Study 1 concluded with even more deviating results, as only 54% (1474 actual person-hours out of 
2723 recommended person-hours) are totally used during inspections including defect fixing. 

As reported elsewhere, plots on reading rate and defect detection rate (see figure 2) show that the 
number of defects found per page decreases as the number of inspected pages (document length) per 
hour increases. Inspection performed too fast will then result in decreased detection rate. However, we 
have not done any (re)analysis of “optimal” reading rates here. Also note, that the individual reading 
rate is a part of the total inspection rate mentioned e.g. in Table 6. 
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Figure 2. Number of pages inspected and defect detection rate. Study 1. 
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5.3 Q2: How cost-efficient are the inspection meetings? 


Table 7 shows the effort consumption for each step of the inspections including defect fixing from 
Study 2. Effort before individual reading and inspection meeting has been proportionally distributed 
on these two activities. 


Table 7. Effort consumption for inspection and defect fixing. Study 2. 



Inspection 

Reading 

Inspection 

Meeting 

Defect 

fixing 

Slun 

Person-hours 

5563 

3215 

11737 

20515 

[%] 

27.12% 

15.67% 

57.21% 

100.00% 


Note that 57.2% of the “inspection-time effort” is spent on defect fixing in Study 2 (1 1,737 of 20,515 
person-hours), while only 21.1% is spent on such (31 1.2 out of 1473.7 person-hours) in Study 1. 
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Table 8 from Study 2, shows the number of defects recorded in reading, in meetings, and the total. 


Table 8. Cost-effectiveness and defect classification from inspections. Study 2. 



Major 

defects 

Super 

Major 

defects 

Sum 

defects 

Defect detection 
effort 

Cost- 

effectiveness 

m 

r%i 

m 

r%] 

m 

Ihl 

rh:m per defect! 

Inspection 

Reading 

4356 

97.2% 

122 

2.7% 

4478 

5563 

01:15 

Inspection 

Meeting 

380 

96.9% 

12 

3.1% 

392 

3215 

08:12 

Entire 

inspection 

4736 

97.2% 

134 

2.7% 

4870 

8778 

01:48 


As mentioned, the defects are classified in two categories; 

♦ Major: Defects that can have a major impact later, that might cause defects in the end products, 
and that will be expensive to clean up later. 

♦ Super Major: Defects that have major impact on total cost of the project. 

In Study 2, 8% of the defects found by inspections are found in the meetings, with a cost-effectiveness 
of 8h: 12min of person-effort. Compared to function test and system test, inspection meetings are 
indeed cost-effective in defect removal. 


5.4 Q3: Are the same kind of defects found in initial inspection reading 

and following inspection meetings? 

We will also like to investigate what type of defects are found during inspection reading versus 
inspection meetings. Note: We do not have data on whether inspection meetings can refute defects 
reported from individual reading (“false positives”), cf [Votta93]. Our data only report new defects 
from inspection meetings (“true negatives”). Table 8 from Study 2 shows, that totally 2.7% of all 
defects from inspections are of type Super Major, while the rest are Major. 

For inspection reading, the Super Major share is 2.7%. For inspection meeting the share is 3.1%, i.e. 
only slightly higher. We therefore conclude that inspection meetings find the same “types” of defects 
as by individual reading. 

No such data were available in Study 1. 
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5.5 H1: Correlation between defects found during field-use and 

document complexity 

Intuitively, we would say that defects detected in field-use could be related to complexity of the 
module, and to the modification rate for the module. The modification rate indicates how much the 
module is changed from the base product, and the complexity is represented by the number of states 
per module (taken from a state machine diagram and reported by TeleLogic’s SDL tool called SDT). 
For new modules the modification grade is zero. Correlation between modules and defect rates for 
each module (i.e., not the absolutely number of defects, but defects per volume-unit) have not yet 
been properly checked. 

In Study 1, the regression equation can be written as: 


where Nfu is number of defects (faults) in field-use, Ns is number of states, N^g is the modification 
grade, and a, (3, and A, are coefficients. Hi can only be accepted if (3 and X are significantly different 
from zero and the significance level for each of the coefficients is better than 0.10. The following 
values were estimated: 

Nfu= -1.73 + 0.084*Ns + 0.097*N„,g 


Predictor 

Coefficient 

StDev 

t 

P 

Constant 

-1.732 

1.067 

-1.62 

0.166 

States 

0.084 

0.035 

2.38 

0.063 

Modrate 

0.097 

0.034 

2.89 

0.034 


Here are s = 1.200, = 79.9%, and R^(adj)= 71.9%, where s is the estimated standard deviation about 

the regression line, is the coefficient of determination, and R^adj) is similar but adjusted for degrees 
of freedom. That is, if a variable is added to an equation, R^ will get larger, even if the added variable 
is of no real value. To compensate for this, R^(adj) is chosen as coefficient of determination. 

The values for estimated coefficients are given above, along with their standard deviation, t-value for 
testing if the coefficient is 0, and /?-value for this test. The analysis of variance is summarised below: 


Source 

DF 

SS 

MS 

F 

P 

Regression 

2 

28.68 

14.34 

9.96 

0.018 

Error 

5 

7.20 

1.44 



Total 

7 

35.88 





In the table above, DF is the degrees of freedom, SS is the total sum of squares corrected for the 
mean, MS is mean sum of squares, F is the Fisher observator for F-test, and P is the significance level 
for this test. 

It should be noted that the coefficients are not significant, but that the states and modification rate are 
significant. The F-Fisher test is also significant, and therefore the hypothesis Hi can be accepted, 
based on the results from the regression analysis. 
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5.6 H2: Correlation between defects found during inspection/test and 

document complexity 


The relevant data come from Study 2. Because just some of the modules are found over several 
lifecycles, only 12 modules out of 443 could be used for this analysis. 12 modules out of 443, shows 
that we should probably have checked out more thoroughly relations between phases in same 
lifecycle, not just between different lifecycles. 


Since data are collected for each document type, and each module in each phase consists of different 
number of document types, one document type is selected through all the phases. The document type 
selected is BDFC (Block Description Flow Chart). Table 9 shows the results. Field marked with 
means that the data are missing, or no module exists. Because all the modules presented in this table 
only were included in project A through E, project F were excluded. 


Table 9. Defect data for BDFC documents over different modules and projects, Study 2. 


Module name 

Project A 

Project B 

Project C 

Project D 

Project E 

Def/page 

Complexity 

Defect found 
basic test 

Def/page 

Complexity 

Defect found 
basic test 

Def/page 

Complexity 

Defect foimd 
basic test 

Def/page 

Complexity 

Defect found 
basic test 

U 

oc 

cd 

Cu 

2a 

u 

Q 

Complexity 

Defect found 
basic test 

SUSAACA 

eeei 

72.0 

K1 

■nw« 

80.5 

3 


- 

- 

- 

- 

- 

- 

- 

- 

SUSAACT 


177.5 

■a 

rail! 

179.0 

4 

- 

- 

- 

- 

- 

- 

- 

- 

- 

SUSCCTB 

EES 

117.5 

58 


120.5 

HI 

- 

- 

- 

- 

- 

- 

- 

- 

- 

SUSCR 


- 

- 

ran 

95.5 

■n 

- 

- 

- 

rajii 

89.00 

- 

- 

- 

- 

SUSCWC 

mtwJitm 

- 



- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

SUSCWHF 


- 

11 

■IKIM 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

- 

SUSCWP 

■WIM 


7 

mtmtm 

ratjii 

13 

- 

- 

- 

- 

- 

- 

- 

- 

- 

SUSSCR 

■«»« 

WdLlfM 

22 


tCMIiM 

34 

- 

- 

- 

- 

- 

- 


- 

- 

SUSACF 

rail 

47.0 

^^3 

ra¥i 

62.5 


- 

- 

- 

- 

- 

- 

EEI 

66.0 

- 

SUSAP 

■IIWM 

67.0 

mm 

- 

- 


- 

- 

- 

- 

- 

- 


78.0 

- 

SUSCCTA 

rail 

269.5 

msm 

- 

297.5 

MkVM 

■ Ktlf 

299.5 

3 

- 

- 

- 

- 

- 

- 

SUSCS 

0.06 

257.0 

14 

0.90 

267.5 

34 

0.18 

254.5 

21 

- 

- 

- 

- 

- 

- 


Each project has data on defects per page found in inspections, the complexity of each module, and 
number of defects found in unit test (here called base test) for each block. 

Hypothesis 2, uses the data presented above, and checks whether there exist a correlation between 
defects found during inspection/test and complexity for a module. The regression equation used to 
state this hypothesis can be written as: 

Y = aX + (3, where Y is defect density, X is the complexity, and a and (3 are constants. 


Ho can only be accepted if a and (3 are significantly different from zero and the significance level for 
each of the coefficients is better than 0.10. The following values were estimated: 

Y = 0.1023*X+ 13.595. 


Table 10. Estimated values, Study 2 


Predictor 

Estimate 

Standard error 

t 

P 

P 

13.595002 

18.52051 

0.73 

0.4729 

a 

0.1022985 

0.093689 

1.09 

0.2901 


13 































































It indicates that the linear regression line must be rejected if a significance of level 0.10 is assumed, 
i.e., neither H 2 nor Ho can be refuted. So more data is needed. 

However, Ericsson reports that the best people often are allocated to develop difficult modules and 
more attention is generally devoted to complex software. This may explain why no significant 
correlation was found. More studies are anyhow needed here. 


5.7 H3: Correlation between defect rates across phases and deliveries 

for individual documents/modules 

This hypothesis, from Study 2, uses the same data as for hypothesis 2. To check for correlation 
between defect densities across phases and deliveries, we have analyzed the correlation between 
defect densities for modules over two projects. Because the lack of data in this analysis, only Project 
A and Project B where used (see table 9). Table 1 1 shows the correlation results. 

Table 11. Correlation between defect density in Project A and B, Study 2. 


Correlation: 0.472 


Defect density in Project A vs. Defect density in Project B 


With a correlation coefficient of 0.4672, we cannot conclude that there exists a significant correlation 
between the two data sets. We had only 6 modules with complete data for both projects for this test. 
The test should be done again, when a larger data set are available. So neither H 3 nor Ho can be 
refuted. 


6. Conclusion 

After analysis of the data, the following can be concluded for Ericsson in Oslo: 

□ Software inspections are indeed cost-effective; They find around 70% of the recorded defects, 
take 6% to 9% of the development effort, and yield an estimated saving of 21% to 34%. I.e., 
finding and correcting defects before testing pays off - so “quality is free”. 

□ 7% of the defects from inspections (3% in Study 1, 8% in Study 2) are found during the final 
meeting, while 93%> are found during the individual reading. Almost the same distribution of 
defects (Major, Super Major) are found in both cases. However, Gilb's insistence on finding many 
(serious) defects in the final inspection meeting is not supported here. 

□ By comparison, [Votta93] reports that 8% of the defects are found in the final inspection meeting. 
Votta therefore proposes to eliminate them, since they are costly (7-14 times less cost-efficient 
than individual reading in our studies) and since their logistics is bothersome (binding up many 
busy people and thus victims to sudden cancellations). However, inspection meetings are indeed 
cost-efficient compared to function tests (6 times more cost-effective in Study 1), and presumably 
to later tests too. Inspection meetings also fulfill important social functions, like dissemination of 
knowledge and promotion of team spirit. At Ericsson they also serve to give an overall quality 
check or approval of design documents. 
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□ Individual reading and individual desk reviews are the most cost-effective techniques to detect 
defects, while system tests are the least cost-effective. 

□ The recommended inspection rates are not really followed, since only 54% to 79% of the 
recommended effort is being used. 

□ The identified defects in a module do not depend on the module's complexity (number of states) 
or its modification rate, neither during inspections nor during testing. 

□ However, the number of defects for one concrete system (Study 1) in field-use correlated 
positively with its complexity and modification rate. 

□ We had insufficient data to clarify whether defect-prone modules from inspections continued to 
have higher defect densities over later test phases and over later deliveries. 

□ The collected, defect data has only been partly analyzed by Ericsson itself, so there is a huge 
potential for further analysis. 

□ The defect classification (Major and Super Major) is too coarse for causal analysis in order to 
reduce or prevent future defects, i.e. a process change, as recommended by Gilb. We also lack 
more precise data from Function test. System test and Field-use. 

It is somewhat unclear what these findings will mean for process improvement at Ericsson. At least 
they show that their inspections are cost-effective, although they could be tuned wrt. recommended 
reading rate (mmiber of inspected pages per person-hour, as part of overall inspection rates). 

On the other hand, a more fine-grained data seem necessary for further analysis, e.g. for root-Cause- 
Analysis (also recommended by Gilb). More detailed information is needed on “false positives” and 
on overlap in detected defects among inspectors to allow capture-recapture analysis. Such defect 
classsification seems very cheap to implement at defect recording time, but is almost impossible to 
add later. However, Ericsson seems rather uninterested to pursue such changes, e.g. since “approval 
from headquarters” is necessary to modify the current inspection process. However, due to a change 
in technology platform from SDL and FLEX to UML and Java, Ericsson will anyhow have to revise 
their inspection process towards object-oriented technologies and corresponding inspection techniques 
[Travassos99]. 

Inspired by these findings, NTNU is anyhow interested to continue its cooperation with Ericsson on 
defect studies in the context of the SPIQ project. Their defect database seems under-used, so these 
studies may encourage a more active utilization of collected data. Further, NTNU has under way 
further longitudinal studies at Ericsson, spanning over several development phases and release cycles. 

Acknowledgements: We thank Torbjom Frotveit and other contacts at Ericsson for their time and 
interest in these investigations, and the Norwegian SPIQ project on Software Process Improvement 
for economic support. We also thank Oliver Laitenberger from Fraunhofer lESE, Kaiserslautern for 
insightful comments. 
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Results 

Study 1 : inspection defects and no. of states in a module 
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Conclusions 1(2) 



only 54—79% of recommended effort spent. 
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Must record ’’false positives”; now only ’’true negatives”. 
Must record overlap between inspectors, to facilitate 



Recommendations 



We may omit inspection meetings for some document 
types or try virtual inspection meetings on the net/web. 
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1. Introduction and Motivation 

Distributed computing applications for the 21st century are network centric, operating in a dynamic environment 
where clients, servers, and the network itself all have the potential to change drastically over time. A distributed 
application, a system of systems, must be constmcted, consisting of legacy, commercial-off-the-shelf (COTS), 
database, and new client/server applications that must interact to communicate and exchange information between 
users, and allow users to accomplish their tasks in a productive manner. The issue is to promote the use of existing 
applications in new and umovative ways in a distributed environment that adds value. To adequately support this 
process, the network and its software infrastmctirre must be an active participant in the interoperation of distributed 
applications. Ideally, we are interested in distributed applications that plug-and-play, allowing us to plug in (and 
subtract) new “components” as needs, requirements, and even network topologies change over time. 

JINI [Amo99, JINI, JINIARCH] is a new architecture built on top of Java’s remote method invocation (RMI) that 
promotes the constmction and deployment of robust and scalable distributed applications in a network centric 
setting. JINI technology is forcing software designers and engineers to abandon the client/server view in order to 
adopt a client/services view. In JINI, a distributed application is conceptualized as a set of services (of all resources) 
being made available for discovery and use by clients. To accomplish this, JINI makes use of a lookup service, 
which is essentially a registry for tracking the services that are available within a distributed environment. Services 
in JINI discover and then join the lookup service, registering the services (of each resource) that are to be made 
available on the network. Thus, JINI is conceptually very similar to a distributed operating system, in the sense that 
resources of JINI are very similar to OS resources. However, in JINI these resources can be dynamically defined 
and changed. To illustrate JINI, consider that a service register_for_course (course#) for a Course 
database in a University application may be registered with the lookup service. Clients request services by 
interacting with the lookup service, e.g., asking for register_for_course (CSE23 0) . The lookup service 
returns a proxy to the client for the location of the service. The client then interacts directly with the service via the 
proxy to execute the service, e.g., registering for CSE230. In this process, there are a number of important 
observations. First, services can come (register and join) and go (leave) without impunity, since all interaction with 
services occurs via the lookup service. Second, clients locate and utilize services without knowing their location on 
the network, allowing clients to work without intermption as long as “some” service can be located to meet their 
needs. Third, the location of clients and/or services on the network can change at any time without impacting the 
network or the users. 

Our efforts are motivated from two perspectives. First, by Army requirements, we evaluated the JINI technology in 
support of present and future systems. Second, as part of grant from AFOSR on large-scale, multi-agent, 
distributed mission planning and execution m complex dynamic environments, we have been considering the ability 
of software agents (written using Java) to interact with JINI resources and services. In both efforts, there are a 
number of common, fundamental questions: 

• Can JINI Support Highly- Available Distributed Applications? 

• Can JINI Support an Environment with Dynamic Clients and Replicated Services? 

• Will Clients Continue to Operate Effectively if Replicated Services Eail? 

• Can JINI be Utilized to Maintain “minutes-off ’ Data Consistency of Replicas? 

• Is JINI Easy to Eearn and Use? What is Maturity Level of JINI Technology? 


^ The work in this paper has been partially supported by a contract from the Mitre Corporation (Eatontown, NJ) 
and AFOSR research grant F49620-99-1-0244. 
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The reality is that new technologies offer new challenges, with the potential to reap benefits if adopted. However, 
for future Army systems, it is important that a careful balance is drawn to opt for mature technologies while 
targeting emerging technologies with potential. The key issue is where JINI fits - as a mature technology or yet 
another one with potential? The remainder of this abstract reviews JINI, our experimental prototyping effort, 
summarizes our results, and proposes a series of future work to answer the question: “Is JINI Ready for Prime 
Time?” 


2. JINI 

Stakeholders (software architects, designers, and implementors) can utilize JINI to constmct a distributed 
application by federating groups of users (clients) and the resources that they require. In JINI, the resources register 
services which represent the functions that are provided for use by clients (and other services). In a sense, the 
services are similar in concept to public methods that are exported for usage as part of an applications class library 
(API). JINI is versatile, and allows a service to represent any entity that can be used by a person, program (client), 
or another service, including: a computation, a persistent store, a commmiication channel, a software filter, a real- 
time data source (e.g., sensor or probe), a hardware device (e.g., printer, display, etc.), and so on. The services are 
registered with a look-up service. The registration of services occurs using a leasing mechanism. With leasing, the 
services of a resource can be registered with the lookup service for a fixed time period or forever (no expiration). 
The lease must be renewed by the resource prior to its expiration, or the service will become unavailable. This 
feature, in part, supports high availability, since it requires the resources to constantly reregister their services; if a 
resource goes down and does not reregister, the leases on its services expire, and the services will then be 
unavailable from the lookup service. 

As a technology, JINI provides an infrastmcture to design and constmct distributed applications with a network 
centric approach that assumes an environment where there is a requirement for the spontaneous interaction of clients 
and services. Spontaneity from a client perspective supports the dynamic behavior of clients, where they enter and 
leave the network unpredictably. While comiected, clients are guaranteed that either the visible services are 
available or that failure can be trapped and handled. Spontaneity from a resources perspective, means that when 
resources fail, the network can adapt, to insure that redundant services, if available, are now accessible to clients. 
Operationally, when a client wishes to interact with a service, the interaction can occur by either a download of code 
from service to client, or the passing of a proxy which allows a RMI-like call by the client to the service. 

The lookup service is the clearinghouse of a JINI network centric application, since all interactions by resources 
(e.g., discovering lookup services, registering services, renewing leases, etc.) and by clients (e.g., discovering 
lookup services, searching for services, service invocation, etc.) must occur tlirough the lookup service. When there 
are multiple lookup services ruiuimg on a network, it is the responsibility of the resources to register with them (if 
relevant). Clients can interact with multiple lookup services, and in fact, it is possible for groups of clients to be 
established that will always consult a particular “close” lookup service, dictated perhaps by network topology or 
traffic. Whenever resources leave the environment (either gracefully or due to failure), the lookup service must 
adjust its registry. There is a time lag between the resource leaving and the removal of services from the registry. 
Clients must be sophisticated enough to be able to dynamically adjust to these situations. 

After discovery has occurred, the resources register services on a class-by-class basis. The class is registered as a 
service object which contains a Java programming interface to the service, namely, the public methods available to 
clients coupled with a set of optional descriptive service attributes. This registration process is referred to joining 
and is shown in Figure 1. In JINI terms, the service object is registered as a proxy, which contains all of the 
information that is needed to invoke the service. In the request for service, shown in Figure I, a client will ask for 
the service to register for a course of the CourseDB class based on the signature of the method: status 
register_f or_course (int) . The lookup service will return a service proxy that allows the client to invoke 
any or all of the methods defined within the service. Using the proxy, the client invokes the needed method(s) as it 
would any other Java method; the call transparently utilizes RMI with the result of the call returned to the client. 
The interaction between the client and the resource occur independent from the lookup service. 

A lease is the part of the JINI programming model that allows the resources to set the limits of its utilization of 
services, and allows the lookup service to remove services from its registry that are no longer available. A resource 
can lease a service to a lookup service forever (not recommended) or lease with a specific expiration date (in 
milliseconds). If leased using an expiration date, the resource is responsible for renewing the lease prior to its 
expiration. The leasing and renewal process is intended to keep the registry fresh, containing all active and working 
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services. This is of particular importance in a distributed application where resources leave the network due to 
failure or other reasons. When a resource leases its services with specific expiration times, if failure occurs, when 
the lease expires and is not renewed, the services will no longer be available. In addition, the lookup service 
periodically checks to see if services (and resources) are active. Whenever failure occurs, there is a time period 
when services will be listed in the registry that are unavailable to clients, and in fact clients will receive exceptions if 
they try to execute such services. Thus, even if a client receives the proxy for a service that is active in the registry, 
there is no guarantee that the service will be available when invoked. Thus, it is imperative that software engineers 
design clients that are able to handle this situation. 



1. Client Invokes AddCourse(CSE230) on Resource 

2. Resource Returns Status of Invocation 


Figure 1: Join, Lookup, and Invocation of Service. 


3. Experimental Prototyping Effort 

We have taken an experimental prototype approach to evaluate the capabilities of JINI under WrnNT to determine if 
JINI is “ready for prime time”. The goal of the experimentation is to explore the ability of JINI to support 
applications that require high availability (via replication of resources and their services and data) in an environment 
where the replicated resources are volatile. Clients, which are also entering and leaving the network, consult the 
JINI lookup service to locate and subsequently execute the “services” of the replicated resource that are necessary to 
carry out their respective tasks. If one of the services fails, there is a back-up service that can be utilized to support 
the client. The replicated databases must be kept consistent, but at any given time point, the data in one database 
might be “minutes off’ the data in the other databases. Over time the databases will synchronize and contain the 
same information. It is crucial that updates not be lost during the modification and synchronization processes. 

A total of six experimental prototypes have been developed modeled on a university application where Persons 
(students and faculty) are attempting to access and/or modify information related to a course schedule. Students and 
faculty have a GUI (Java client application) through which they must enter their name and password, and once 
verified, are able to access course information. To support this, both a PersonDB (for authentication and 
authorization) and a CourseDB must be available. These two databases are stored in Microsoft Access, and a Java 
application or database resource, offers a set of “services” that are made available by registration with JINI to 
clients. A Java GUI client consults the JINI lookup service to search for appropriate services of the replicated 
database resource that can satisly their requirements as needed by the student/faculty request. Whenever a Java GUI 
client modifies the CourseDB as a result of a user request, all other replicated CourseDBs must be modified so that 
the replicas remain consistent. However, there may be a time difference where the data in one CourseDB is minutes 
off the data in the other CourseDBs. For discussion purposes, Prototype 6 is shown in Figures 2 and 3. 
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Figure 2: Pre-Lookup Services iu Prototype 6. 


1. Request 8. Release 



Figure 3: Execution Process in Prototype 6. 

Prototype 6 incorporates a pre-lookup resource and associated services that implements a protocol that supports 
simultaneous reads in conjunction with at most one exclusive write, and includes PersonDB and CourseDB services 
for use by GUI clients. The pre-lookup services as shown in Figure 2, allow the locking and unlocking of services, 
identify clients (getcClientID), and permit replicated database resources to register their services with the pre- 
lookup service (addService and rravService). Thus, clients can still read the data even if one client is holding 
a write lock. PersonDB services are for authorization and authentication of the client, while CourseDB services 
allow course information to be queried and changed. Figure 3 illustrates the process and steps taken by a client. 
After startup, the client applications will be interested in discovering and utilizing services. In Prototype 6, prior to 
the JINI lookup service being consulted, the client must first interact with the pre-lookup service, as shown in Figure 
3, arrow 1. The client consults with the pre-lookup service by discovering its existence and interacting with the JINI 
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lookup service to obtain a proxy to request a lock. If a lock on the required service (read, insert, delete, or modify 
the CourseDB) is granted, the client can proceed according to arrows 3 through 7 in Figure 3. If a lock is not 
granted, the client is told to wait. The pre-lookup service will queue the client’s identifier for the requested service 
to insure that starvation is prevented for clients that are denied locks at the pre-lookup service. Then, Client 1, in 
this case, enters a loop which will continuously request the lock (arrow 1) from the pre-lookup service. As long as 
another client holds the lock, a wait response will be sent to Client 1 . Eventually the client holding the lock desired 
by client 1 will release the lock. When Client 1 next requests the lock and the first element of the queue for the 
service contains its identifier. Client 1 will be granted the lock, and processing proceeds via arrows 3 through 7. 

4. Conclusions and Recommendations 

Our conclusions and recommendations are constmcted from a two-fold perspective. First, our efforts on the 
experimental prototypes have answered, in part, the questions posed in the introduction, specifically: 

• Can JINI Support Highly-Available Distributed Applications? Yes, in fact Prototype 6 demonstrates 
that JINI can be utilized to architect solutions that are highly available. 

• Can JINI Support an Environment with Dynamic Clients and Replicated Services? 
Will Clients Continue to Operate Effectively if Replicated Services Eail? Yes, in Prototype 6, it was 
possible to start and stop clients and stop and start resources. As long as JINI was given time to remove 
“failed” services, the clients and resources continued to interact effectively. 

• Can JINI be Utilized to Maintain “minutes-ofP’ Data Consistency of Replicas? Prototype 6 with the 
pre-lookup guaranteed that no updates would be lost if different clients attempted simultaneous updates. 

The results are extremely relevant for present and future Army systems, and for distributed enterprise applications, 
in general, since the different architectural components of the prototypes can be cast as a new Java GUI, a legacy 
relational databases wrapped using JDBC/ODBC, and databases for authorization and general purpose information 
of interest to clients. 

Second, is JINI Ready for Prime Time? That is clearly the question of interest. In our limited, yet concentrated 
evaluation of JINI, we have found many features that make it extremely attractive as a 21** century technology. Our 
reasons for believing JINI is ready for prime time include: 

1. Compatibility of JINI with Java write once run anywhere infrastructure. The Java language and 
environment under which JINI operates is extremely homogenous, is operating system independent, and 
promotes interoperability between all of the components (clients and services) withm the distributed 
application. 

2. Commitment of Sun to Java and JINI technologies, as evidenced by a recent keynote address by 

Chief Scientist Bill Joy [BJOY|. There is a significant commitment to JINI by Sun, and an expectation 
that JINI will play a major role in the Java arena m the coming years. 

3. Understandability and ease of use of JINI. The mdividuals doing development had Java and database 

expertise, but no background in using JINI, Visual Cafe, and JDBC/ODBC, hi 400 hours of work over the 
two month period of the work, six prototypes were designed and developed This speaks to the ease of use 
of Java and JINI technologies. 

4. High-level abstraction nature of JINI API. From a software engineering perspective, one of the major 

strengths of JINI is the ability to design a solution to a distributed application in terms of clients and the 
services that are required. This design can be constmcted using a UML modeling tool. We believe that 
with JINI, UML modeling tools, and Java development environments, good software engineering practices 
and products can be attained. 

However, our enthusiasm must also be tempered by the fact that our investigation, exploration, and evaluation of 
JINI is only in the initial stages. While our experiences have been mostly positive, there are a number of future work 
topics that must be explored in detail to arrive at a definitive conclusion. 

• Interoperability of JINI with critical technologies. Will JINI work with legacy, COTS, and database 

assets? Will JINI inter-operate with CORBA and other distributed computing solutions? Can JINI and 
software agent paradigms successfully interact? All are critical to assess JINFs utility in 21** century. 
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• Verification of write-once-run-anywhere. Is prototype of Section 3 extensible to Win95/98 and Solaris? 

Will Oracle, Informix, and other database platforms work? JINI’s readiness for 2T* century must be 
verified by conducting multi- and heterogeneous platform experiments. 

• Utility/robustness of other JINI technologies. The list includes two-phase commit transactions, events in 

JINI, JINI’s security model, and JavaSpaces, an API on top of JINI. 

• High-availability via multiple lookups and pre-lookup services. Great care must be taken to explore, 

design, and implement prototypes that allow the incorporation of multiple lookup/pre-lookup services to 
have a reasonable and manageable impact on client applications. 

• Performance and scalability. While our prototypes worked with 3 NTs, in practice, 10s, 100s, and even 

1000s of clients and resources will need to interact. Consequently, the ability of JINI to scale and maintain 
performance in such a situation will be cmcial. 

Also, it is important to note that the JINI specification continues to evolve [JINISPEC]. Despite this cautionary note, 
based on our experiences and intuition, we believe that JINI has great promise and will be a successful and useful 
technology for the 21** century. 

References 

[Amo99] K. Arnold, et al. The JINI Specification, Addison-Wesley, 1999. 

[Edwa99] K. Edwards, Core JINI, Prentice-Hall, 1999. 

[Free99] E. Freeman, et al, JavaSpaces Principles, Patterns, and Practice, Addison-Wesley, 1999. 

[Morr97] M. Morrison, et al., Java Unleashed, second edition, Sams.net Publishing, 1997. 

[Wald99] J. Waldo, “The JINI Architecture for Network-Centric Computing”, Communications of the ACM, Vol. 
42, No. 7, July 1999. 

[BJOY] htto:/7www.iavasoft.com,/featurcs/1999/07/bill.iov.html 
[JINI] http://Vaw.sun.com,/iini/ 

[JINIARCH] httD://'v^'^^'^^^suu.com/ii^i/whitepaDers./arch^tecture.html 
[JINISPEC] http://www.sun.com/iiiii/sr)ecs/iinil lsoec.html 

Sample JINI Software: 

http://www.cnete.com/download/ and http://wwww.artima.com/iavaseminars/modules/Jipi/CodeExamples.html 
http ://\iv'v,'v,'.i ini vision, com an d http://members.liome.net/ieltema 

JINI Tutorial: 

http ://pandonia. canberra.edu. au/i avad' ini/ lutorial/Jini .xml 

JINI-Related Information and Links: 

http ://ww.Vi.'. i ini.org and http://www.eli.sdsu.edu/courses/spring99/cs696.motes/index.html 
http://www.artima.com/obiectsiini/introJini.html and http://'W'v,^'.artima.com/iini/resources/index.html 

Link for JINI Installation: 

http :// developer, i av a. sun, com/ developer/products/i ini/installation.litml 
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ABSTRACT 

Integration of software components into a system can be hindered by incompatibilities between 
the components and system. To predict the possible incompatibilities and the ways to 
overcome them during the integration activities, a classification of incompatibilities can be 
useful for software developers. This can be especially crucial for COTS-based software 
development, where a software system is being built out of potentially highly heterogeneous 
software components. The resulting system can have a complicated architecture due to the 
diversified nature of its components (e.g., a message-based system with object-oriented and 
procedural sub-systems), and the architectural incompatibilities of the COTS products must be 
overcome. Moreover, the functionality of the COTS software products must be taken into 
account during COTS integration. In this paper we present a classification of incompatibilities 
based on the properties of local component interactions. We believe that this classification can 
capture possible problems about software component integration in heterogeneous software 
systems, including architectural and functional issues. 


1. INTRODUCTION. 

Commercial-off-the-shelf software is developed by a third party and intended to be part of a new 
software system [McDemrid, Talbert 97]. Usage of COTS products is growing, because developers 
hope that it will increase their systems quality and reduce development time. However, COTS 
based development implies specific problems (such as selection, integration, maintenance, and 
security) whose solutions can be illustrated by answering the following questions: 

How to select the most suitable COTS product in the market? 

How to integrate the COTS product into the new system? 

How to maintain a system that has components developed outside? 

How safe a COTS software product is? 

These are just a few problems. In this paper we are going to discuss COTS integration and its 
impact on COTS selection. The importance of discussing COTS selection and integration show up 
when considering that COTS products are developed to be generic, however, being integrated into 
a system, they are used in a specific context with certain dependencies. The existence of 
mismatches between the COTS product being integrated and the system is possible due to their 
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different architectural assumptions and functional constraints. These mismatches must be overcome 
during integration and they have to be identified even earlier. Thus, a classification of mismatches 
or incompatibilities can be useful for COTS selection and integration. 

There are some publications exploring integration architectural issues. For instance, [Gacek et al. 
95], [Shaw 95], [Shaw, Clements 96] identify and classify architectural mismatches and styles. 
[Abd-Allah, Boehm 96] and [Gacek 98] deal with heterogeneous architectures. This is especially 
important for COTS development because a COTS-based software system can be built out of 
potentially highly diversified software components, which can result in a heterogeneous 
architecture (e.g., a message-based system with object-oriented and procedural sub-systems) for the 
software system. However, not just architectural mismatches must be considered for integrating 
COTS, but also the required functionality, non-functional constraints, and software developers 
expertise level. 

A COTS product can have gaps in required functionality, it can have incompatible interfaces, 
different architectural assumptions, and it can conflict with other system components. Selecting 
suitable COTS products for a project can require finding a trade-off between different mismatches 
depending on the organization’s development capabilities. For example, if an organization has a 
strong expertise in a functional domain but little experience in coping with architectural problems it 
can consider acquiring COTS products with less required functionality but with few architectural 
mismatches. On the contrary, if an organization is more experienced in architectures than in the 
domain it should select COTS products with as much functionality as possible, although there can 
be considerable architectural problems. The right selection can minimize the integration effort. 

Therefore in this work we propose a general classification of possible types of mismatches between 
COTS products and software systems, which includes architectural, functional, non-functional, and 
other issues. We present a classification of incompatibilities based on the properties of local 
component interactions. We believe that this classification captures possible problems about 
software component integration in heterogeneous software systems. We expect that the 
incompatibility classification can help to estimate the effort (cost) of the integration of the COTS 
products prior to deciding about using a specific one. By utilizing it, software developers can 
decide about a COTS product early in the software process, anticipating the possible integration 
risks. 

This paper has four sections including this introduction. Section 2 deals with the interactions and 
how such concepts can be explored to identify incompatibilities. The third section explores the 
whole model, showing which types of incompatibilities software developers should look for. Also, 
a short example of using such a scheme is presented. Section 4 concludes this discussion and shows 
some on going works regarding estimation of cost for COTS integration. 


2. INTER-COMPONENT INTERACTIONS AND CLASSIFICATION. 

The incompatibilities, for the context of this work, are essentially failures of components’ 
interactions, so finding and classifying these interactions will help to find and classify the 
incompatibilities. We consider three aspects of inter-component interactions and incompatibilities: 
type of interacting component, layer (syntax or semantic-pragmatic), and number of components 
participating in the interaction. 
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First, the components interact with other system components, and with the system 
environment. System components can be either software or hardware (excluding everything related 
to the environment, such as CPU and memory, but including devices directly controlled by the 
system, such as on-board devices) that are used by the software system. The environment can be of 
the development phase, which includes compilers, debuggers, and other development tools, or it 
can be the environment of the target system, which includes Operating Systems, virtual machines 
(such as Java), interpreters (such as Basic), and other applications and utilities used by the target 
system. The parts of both environments can also be considered components. Figure 1 shows the 
different perspectives that can be used to classify these software component interactions. 



Figure 1. Interactions of software components. 


Then two main layers can be differentiated in the inter-component interactions: 

• Syntax, defines the representation of the syntax rules of the interaction, e.g., the name of 
invoked function; the names, types, and the order of the parameters or data fields in the 
message, etc. For instance, float SQRT(float x) represents a C notation for a function called 
“SQRT” returning a real result and with one argument, a real number x. 

• Semantic-pragmatic, defines the functional (semantic and pragmatic) specifications of the 
interaction, i.e., what functionality is performed by the component, e.g., invoking the function 
"SQRT" calculates the square root of its only argument and returns it to the caller. However, in 
this work we do not consider semantic and pragmatic issues separately. 

Finally, an incompatibility can occur in an interaction involving a certain number of 
participating components. A syntax incompatibility can occur because of syntactic difference 
between two components, but a semantic-pragmatic incompatibility can be caused either by just 
one component, two mismatching components, or three or more conflicting components. Thus, 
incompatibilities of the semantic-pragmatic layer can be classified according to the exact number of 
components that caused the interaction to fail. Therefore, the following types of semantic- 
pragmatic incompatibilities can be considered: 

• 1-order semantic-pragmatic incompatibility, or an internal problem, if a component alone 
has an incompatibility disregarding the components it is interacting with. It means that the 
component either does not have required functionality (not matching the requirements) or its 
invocation can cause a failure (an internal fault). 
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• 2-order semantic-pragmatic incompatibility, or a mismatch, if an incompatibility is caused 
by interaction of two components. Both components may not have 1 -order incompatibilities 
and can work correctly in other contexts. For example, a procedure that calculates the square 
root of a real number receives a negative argument from a caller that supposes that this is a 
valid output. 

• N-ordcr semantic-pragmatic incompatibility, or a conflict, if an incompatibility is caused by 
interactions of several components. There may not be semantic-pragmatic 1 -order and 2-order 
incompatibilities for these components, but their cumulative interaction can cause a failure. For 
example, several processes together require more memory than the available amount, although 
each of them can be satisfied independently, so there is an n-order incompatibility on the 
semantic-pragmatic layer in interactions with the target platform. 

According to the assumptions above, syntactical and semantic-pragmatic incompatibilities can 
occur in the system and environment dimensions. Table 1 captures this classification, where the 
cells are described below. 


Type of component 

System 

Environment 

Type of incompatibility 

Software 

Hardware 

Development 

T arget 

Syntax 

1.1 

2.1 

3.1 

4.1 

Semantic-pragmatic 1 -order 

1.2a 

2.2a 

3.2a 

4.2a 

Semantic-pragmatic 2-order 

1.2b 

2.2b 

3.2b 

4.2b 

Semantic-pragmatic n-order 

1.2c 

2.2c 

3.2c 

4.2c 


Table 1 . Interactions incompatibilities. 


1. Interactions with software 

1.1. Syntax: 

Three different types of syntax incompatibilities can be described here. Although there is only 
one cell capturing the idea of syntax issue for software in Table 1, its contents allows the 
identification of differences/incompatibilities regarding: 

• Information flow, e.g., control instead of data. 

• Binding: static, dynamic compile-time, dynamic run-time, topological, etc. As the result 
a component can not find another one. 

• Interface protocol: different number of parameters or data fields, or different types of 
parameters or data fields. 

1.2. Semantic-pragmatic: 

1.2. a. 1 -order: internal problem. These incompatibilities appear when the COTS product 
does not match the required functionality (e.g. it does not perform a required function), or 
due to its poor quality it still does not work properly (an internal fault). On the other hand, it 
can be other software that is solely responsible for the failure of interaction with the COTS 
product. 

1.2. b. 2-order: different assumptions between two components, including the 

synchronization issue. These incompatibilities are products of a mismatch between the 
COTS product and other components surrounding it. Even when two components have 
correct functionality they can fail to work together due to some differences, (e.g., one object 
uses metric units, but another one uses inches, therefore the result can hardly be correct; 
another example is a mismatch between an asynchronous and a synchronous component). 
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I.2.C. N-order: a conflict between several software components. Even when the COTS 
product works correctly itself and correctly interacts with other components, some 
incompatibilities can appear as the result of a combined interaction with several other 
software components, (e.g., an object that controls rotation of a spacecraft receives the 
command for rotating on n degrees from a commanding object, but occasionally there is 
another commanding object, which sends the same command at the same time, in the 
system. Every single interaction is correct, but the spacecraft rotates twice as fast as it 
should do.) 


2. Interactions with hardware 

2.1. Syntax: 

Different type of protocol. A software component can not work with a piece of hardware, 
because they assume different protocols (e.g. TCP/IP and Decnet or different port 
numbers). 

2.2. Semantic-pragmatic 

2.2. a. 1-order: wrong functionality of hardware or the COTS component. A hardware 
component does not work correctly (e.g. a printer does not support the Cyrillic alphabet), or 
the COTS component causes a failure. 

2.2. b. 2-order: different assumptions between software and hardware. An interaction 
between software and hardware components does not work correctly (e.g., a program tries 
to print a Cyrillic text, but the printer has a different coding for the Cyrillic alphabet, 
therefore the output will be unintelligible). 

2.2. C. N-order: a conflict between several software components over hardware. An 
interaction among several software components and a hardware component does not work 
correctly (e.g., several applications simultaneously accessing a single printer). 

3. Interactions with the Development Environment 

3.1. Syntax: 

Different components’ representation. The environment does not understand the packaging 
of a software component (e.g., a C program can not be compiled by a Fortran compiler). 

3.2. Semantic-pragmatic: 

3. 2. a. 1 -order: wrong functionality of the environment or the COTS component. The 
environment does not work properly (e.g., a defect in the compiler version), or the 
component has an error (e.g., a program can not be compiled because of a syntax error in 
it). 

3.2. b. 2-order: different assumptions between the software component and the 
environment. A software component can not interact with the environment (e.g., a program 
is written in an old dialect of the language and can not be compiled by a newer compiler). 

3.2. C. N-order: a conflict between several software components over the environment. An 
interaction among several software components and the development environment causes 
an incompatibility (e.g. two or more C modules can not be compiled or linked together 
because of a name collision). 

4. Interactions with the target environment 

4.1. Syntax: 

Platform type. The environment does not understand the packaging of a software 
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component (e.g., a program uses another OS, or an interpreter can not ran a program written 
in another language). 

4.2. Semantic-pragmatic: 

4.2. a. 1 -order: wrong functionality of the environment or the COTS component. The 
environment does not work properly (e.g., the OS crashes), or the component has an error 
(e.g., a memory violation in a program). 

4.2. b. 2-order: different assumptions between the software component and the environment. 
A software component does not interact with the environment correctly (e.g., a different 
version of the OS version performs some functions used by the component in a way other 
than expected by the component’s developers). 

4.2. c. n-order: a conflict between software components over the environment, including the 
control issue. An interaction among several software components and the environment 
causes an incompatibility (e.g. a conflict between two object-oriented frameworks in a one- 
process program for the control flow [Sparks et al. 96]). 


3. TYPES OF INTEGRATION PROBLEMS. 

Different incompatibilities have different solutions, but generally we can find five groups of related 
problems with the proper solution strategies. We assume that one type of incompatibilities can 
cause problems in different groups. For example, a syntax software incompatibility can cause 
different types of binding, which can require a special architectural solution for the whole system, 
or it can be just a different order of parameters, which can be overcome by a simple wrapper. Thus, 
we can differentiate the following groups of integration problems: 

• Functional. All the 1 -order semantic-pragmatic incompatibilities that are caused by missing or 
wrong functionality. Re-implementation or modification of faulty components can solve these 
problems. 

• Non-functional. Some 1 -order semantic-pragmatic incompatibilities can be caused by not 
matching to non-functional requirements, such as reliability, maintainability, efficiency, 
usability, etc. These problems are difficult to solve without reworking the component. 

• Architectural. These issues constitute another class of problems and can cause changing the 
overall system’s architecture, but the incompatibilities causing them are different. In this work 
we consider the following architectural assumptions of software components with their 
respective incompatibilities: packaging (syntax development and target environments), control 
(n-order semantic-pragmatic target environment), information flow (syntax software), binding 
(syntax software), synchronization (2-order semantic -pragmatic software) [Shaw 95], 
[Yakimovich et al. 99]. 

• Conflicts. Problems of this type are conflicts between components in the system (e.g., 
deadlocks). The related incompatibilities are n-order semantic-pragmatic software and 
hardware. The possible solutions can include changing the system’s configuration without 
changing the overall architectural type (minor architectural changes, including monitoring 
components) and using glueware. 

• Interface. These problems are incompatible interfaces between the components caused by 
some syntax and 2-order semantic-pragmatic software and hardware incompatibilities (other 


6 



than major architectural). The possible solution is glueware. 


Another property of this high-level classification is that the classes of problems are specific to the 
particular development phases. Functional and non-functional issues require information on the 
project and COTS product frmctionality, which is available early in the requirements analysis 
phase. Architectural issues are dealt with during the design phase when the system’s architecture is 
being designed. Conflicts and interface issues are addressed later in the design phase when the 
system’s architecture and the component’s interfaces are known. 

Let us consider the following example to illustrate our approach; a 3D-graphics engine is being 
chosen for a real-time system. The system being developed imposes the following high-level 
requirements for the graphics engine: 

Functionality, drawing 3-dimensional objects, including input and output 3D images from files. 
Non-functional issues (portability): Mac. 

Architectural issues (development platform): Ada 95. 

Interfaces (example of a function): procedure Rect(x, y, w, h: Real); where (x,y) - the coordinates 
of the left bottom comer of the rectangle; w - its width; h - its height; output - drawing a rectangle. 
Other specifications, such as non-functional requirements, hardware requirements, possible 
conflicts, etc., are not considered in this example. 

The possible candidate COTS products are OpenGL, QuickDraw3D, and DirectX [Thompson 96]. 
Matching them against the requirements gives the following data: 

OpenGL: 

Functionality, the drawing functions are provided, input and output from files is not 
supported - 1 -order semantic-pragmatic incompatibility. 

Non-functional issues (portability): Mac platform is supported. 

Architectural issues (development platform): an Ada implementation is available. 

Interface: procedure glRectf(xl:GLfloat; yLGlfloat; x2:Glfloat; y2:GLfloat); where 
(xl,yl) - the coordinates of one vertex of the rectangle; (x2,y2) - the coordinates of the 
opposite vertex of the rectangle. There are a syntax incompatibility (different procedure 
names) and a 2-order semantic-pragmatic incompatibility (different interpretations of the 
arguments) with software components. 

QuickDraw3D: 

Functionality: drawing provided, input and output from files is supported. 

Non-functional issues (portability): Mac platform is supported. 

Architectural issues (packaging): Ada 95 implementation is not available - 2-order 
semantic-pragmatic incompatibility with the development platform. 

Interface: it is not necessary to consider it, because it is expensive to use QuickDraw3D due 
to the different packaging. 

DirectX: 

Functionality, drawing provided, input and output from files is supported. 

Non-functional issues (portability): Mac platform is not supported - 2-order semantic- 
pragmatic incompatibility with the target platform. 
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Architectural issues (packaging): Ada 95 implementation is not available - 2-order 
semantic-pragmatic incompatibility with the development platform. 

Interface: it is not necessary to consider it, because it is extremely expensive to use DirectX 
due to the different packaging and target platform. 

The result of this comparison is that OpenGL is the best candidate, despite certain incompatibilities 
that can be overcome using glueware and re-implementation. Use of C-implemented QuickDrawSD 
would require changing the system’s architecture. Use of DirectX would require porting it to Mac, 
which is hardly a real operation. 


4. CONCLUSIONS AND ON-GOING WORKS. 

In this paper we presented a classification of incompatibilities between software (including COTS) 
components and other parts of a software system. This classification is intended to find the possible 
problems, including functional, architectural, non-functional, conflict, and interface, when a COTS 
software component is being integrated into a system. We hope that the incompatibility 
classification and the effort estimation approach can be useful for software developers to evaluate 
and integrate COTS software. 

We have given above a classification of possible incompatibilities between the software (COTS) 
and other system components. However, to select a COTS product, developers must also know the 
effort required for overcoming these incompatibilities. To estimate the integration effort developers 
have to answer the following sequence of questions: 

- What are the incompatibilities? - What is the difference between the system's requirements and 
the COTS products. This difference can be found using approaches, such as the comprehensive 
reuse model [Basili, Rombach 91]. 

- How are they to be overcome? - What integration strategies can be used by the developers to 
integrate the COTS software products (e.g., re-implementation, glueware, changes of architecture). 

- What is the amount of integration work? - This is a quantitative estimation of the two items 
above; how much work is to be done to fill a certain gap. 

- What is the productivity (skill) of the developers for the applied integration strategy? - This 
reflects the skill of the developers with respect to particular integration tasks. The higher it is, the 
faster they can perform the same amount of work. It can be possible to define techniques in 
different strategies, for example, re-implementation using object-oriented, procedural, or another 
paradigm. Specifying techniques within the strategies will demand more data about the 
organization, but on the other hand, the analysis will be more fine-tuned. 

- What is the effort required for overcoming a particular incompatibility between a COTS product 
and the system? - This is obtained from the previous two items by dividing the amount of work by 
the productivity. 

- What is the total effort required for integrating a COTS product? - This is the sum of the efforts 
required for resolving all the incompatibilities between the COTS product and the system. 
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Essentially, this is a bottom-up effort estimation model: each of the COTS product components is 
analyzed with respect to all its possible interactions with system to be integrated in. If an 
incompatibility is found the effort to overcome is estimated based on the amount of integration 
work and the productivity of organization for this type of work. The overall integration cost is the 
sum of overcoming all the incompatibilities between the COTS product’s components and the 
system. However, to develop this COTS evaluation approach we must find effective ways to 
measure the productivity and the gap between the requirements and the system being developed. 

As a research work, a process model for COTS selection, evaluation, and integration is being 
defined incorporating the ideas showed in this paper. Some experiments have been planned to 
empirically validate such a model. The results of these experiments, and the whole model, will be 
described in future publications. 
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